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Preface 


During recent years a vast body of knowledge central to the problems 
of communication engineering has accumulated piecemeal in the journal 
literature. Unfortunately, this work is often couched in advanced mathe- 
matical terms, and no over-all synthesis at the level of an introductory 
textbook has been available. As a result, even at second glance, the dis- 
ciplines of coding and modulation often appear to be distinct and the 
abstractions of information theory to be only vaguely connected with 
the realities of communication system design. 

We hope that this book will provide a cohesive introduction to much 
of this apparently disparate work. We have been motivated by three 
related objectives. The first is to establish a sound frame of reference for 
further study in communication, random processes, and information 
and detection theory. The second is to make the central results and con- 
cepts of statistical communication theory accessible and intuitively 
meaningful to the practicing engiheer. The third is to illuminate the 
engineering significance and application of the theory and to provide a 
quantitative basis for the compromises of engineering design. 

Book content and scope reflect these objectives. The subject matter 
progresses systematically from elements of probability and random 
process theory through signal detection and selection, modulation and 
coding, demodulation and decoding, and engineering compromises. 
Unity is sought through consistent exploitation of the geometric concepts 
of Shannon and Kotel’nikov, which place clearly in evidence the inter- 
relations among such phenomena as the incidence of threshold with 
“twisted” and “sampled and quantized” modulation systems. 

The development of the subject matter is almost entirely self-contained 
and does not demand mathematics more sophisticated than now en- 
countered in an undergraduate electrical engineering curriculum. We 
presume that the reader has a thorough grasp of Fourier and linear 
systems theory — that he is able not only to write down but also to evaluate 
a convolution integral — and that he has been exposed to complex inte- 
gration. Prior knowledge of linear algebra and probability theory is 
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helpful but not necessary. In those half dozen instances in which theorems 
must be invoked whose formal proof exceeds the level of the text an effort 
has been made to make their meaning plausible as well as plain. We also 
presume that the reader is already well founded in electronic circuits, 
which we do not discuss. 

Although the mathematical level of the book is intentionally con- 
strained, the intellectual level of the subject matter is not. Indeed, although 
the book begins at a quite elementary level, later chapters treat many 
topics that lie near the forefront of current communication research and 
incorporate certain results that have not previously been published. The 
early chapters are presented in a way that leads naturally into the deeper 
material of the later chapters, even though a less general presentation 
might suffice if an open-ended treatment were not desired. 

To some extent depth of treatment has been facilitated by new and less 
formidable derivations of well-known results. To a larger extent, however, 
it has required restricting consideration to communication models that are 
mathematically tractable. The premise is that complex ideas are best con- 
veyed in the simplest possible context. Thus the book is primarily concerned 
with Gaussian channel disturbances and performance bounds obtainable 
from union arguments. Extension to more general channels and tighter 
bounds requires additional technique but little that is new in the way of 
concept. 

The selection and treatment of the subject matter reflects our bias as 
well as our objectives. For example, although coding is not an eco- 
nomically viable solution in many engineering environments, in certain 
others it appears to be the most attractive solution. We feel in consequence 
that a communication engineer needs to appreciate the operating char- 
acteristics, capabilities, and limitations of coding. An entire chapter is 
therefore devoted to a study of coding and decoding implementation. 

The scope of the book is adequate to span a two-semester sequence of 
first-year graduate instruction, and the subject matter has been arranged 
with such a course in mind. A natural division is to cover Chapters 1 
to 5 in the first semester and Chapters 6 to 8 in the second. This progres- 
sion provides a unified and extensive treatment of digital communication 
before consideration of the mathematical and conceptual issues of con- 
tinuous modulation, which are inherently more subtle. 

The first five chapters may also be used alone as a self-contained one- 
semester introduction to data communication. An alternative one- 
semester course comprises Chapters 1 to 4 plus the first half of Chapter 7 
and the first two thirds of Chapter 8. The latter sequence has the ad- 
vantage of including some continuous modulation theory but forfeits 
the central idea that error-free communication is attainable even when a 
channel is noisy. 
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Either one-semester configuration may be used as a senior honors 
course for undergraduates who are seriously interested in communica- 
tions; successively revised versions of Chapters 1 to 5 have been taught 
at the Massachusetts Institute of Technology to seniors by nine different 
faculty members during the four years of manuscript preparation. On 
the other hand, Chapter 6 and the last parts of Chapters 7 and 8 seem 
distinctly graduate in character. Most of the problems at the end of 
each chapter are relatively deep and many extend the material of the 
text. We anticipate that instructors teaching undergraduates will wish 
to supplement these problems with others designed for purposes of drill. 

No book is written in a vacuum, but we feel a special debt to our 
colleagues. The intellectual mainsprings of this work stem from the 
pioneering research of T. A. Kotel’nikov, C. E. Shannon, R. M. Fano, 
and P. Elias. To the last three we are indebted not only for their work 
but also for their inspired teaching, generous counsel, and constant 
encouragement. Several of the recent refinements and extensions of the 
theory are attributable to R. G. Gallager. Valuable suggestions were 
received from W. B. Davenport, W. M. Siebert, B. Reiffen, H. A. Van 
Trees, D. A. Sakrison, R. S. Kennedy, I. G. Stiglitz, V. R. Algazi, T. S. 
Huang, A. M. Manders, H. A. Yudkin, and J. E. Savage, In addition, 
both of us have benefited immeasurably from our association with the 
M. I. T. Lincoln Laboratory, at which the experimental work discussed in 
Chapter 6 was performed under the direction of P. Rosen and I. L. Lebow. 

Deborah Brunetto, Barbara Johnson, Marilyn Pierce, Elaine Geller, 
and Louise Juliano typed and retyped the manuscript through innumerable 
revisions. Helen Thomas generously edited and D, G. Forney, Jr., 
carefully proofread the final version. Most of the computations were 
programmed by Martha Aitken. 

We are grateful to the Department of the Army and to the National 
Aeronautics and Space Administration for partial support of the research 
reported herein. Manuscript preparation was supported in part by a 
grant made to the Massachusetts Institute of Technology by the Ford 
Foundation for the purpose of aiding in the improvement of engineering 
education. Lastly, to our students and associates in the Research Labo- 
ratory of Electronics and Department of Electrical Engineering at the 
Massachusetts Institute of Technology we owe an unrepayable debt for 
stimulation and opportunity. 

J. M. WOZENCRAFT 
Irwin M. Jacobs 

Cambridge , Massachusetts 
June 1965 
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Introduction 


Today the world is spanned by a web of electrical circuits that permits 
near-instantaneous communication over vast distances. This book is 
concerned with the fundamental principles underlying the engineering of 
these communication links. In particular, it provides an introduction to 
what communication technology can, and cannot, accomplish. 


1.1 HISTORICAL SKETCH 

The development of communication technology has proceeded in step 
with the development of electrical technology as a whole. Few indeed are 
the innovations that have not found almost immediate communication 
application. For example, the demonstration of telegraphy by Joseph 
Henry in 1832 and by Samuel F. B. Morse in 1838 followed hard on the 
discovery of electromagnetism by Oersted and Ampere early in the 1820 s. 
Similarly, Hertz’s verification late in the 1880’s of Maxwell’s postulation 
(1873) predicting the wireless propagation of electromagnetic energy led 
within 10 years to the radio-telegraph experiments of Marconi and Popov. 
The invention of the diode by Fleming in 1904 and of the triode amplifier 
by de Forest in 1906 made possible the rapid development of long-distance 
telephony, both by radio and wire. 

In recent times the coin has often been reversed. The instantaneous 
success of the telephone, patented by Alexander Graham Bell m 1876, 
created an insatiable demand for communication which in turn has 
stimulated innumerable fundamental advances in electrical technology. 
For instance, the invention of the wave filter by G. A. Campbell m 1917 
came in response to the need for transmitting many different conversations 

simultaneously over a single telephone line. _ , 

Communication technology may be broken conveniently into three 
interacting parts: the signal-processing operations performed the devices 
that perform these operations, and the underlying physics. Although it is 
to the first of these areas that this book is directed, it is important to realize 
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that developments in all three have been mutually reinforcing. Indeed, 
one impact of a new device has frequently been the uncovering of new 
signal-processing questions. As an example, the development of the wave 
filter led naturally into Nyquist’s investigation of the properties of band- 
limited waveforms. 


Communication Theory 


Given that it is possible to perform a sequence of signal-processing 
operations, when is it desirable to do so and what are the advantages and 
limitations? Such questions and their answers constitute the corpus of 
what is called communication theory. This theory has assumed increasing 
importance since the advent of digital computers provided opportunities 
for signal-processing orders of magnitude subt|er and more complex than 
ever possible before. 

The beginnings of communication theory lie in the work of Nyquist, 60 
who in 1924 extended unpublished work by J. R. Carson and concluded 
that the number of resolvable (noninterfering) pulses that can be trans- 
mitted per second over a bandlimited channel is proportional to the 
channel bandwidth. More exactly, Nyquist concluded that the maximum 
number of pulses resolvable in a T - sec interval with a channel of band- 
width W cps is kTW; here it is a proportionality factor no greater than 
2, the exact value of which depends on the pulse waveshape and the 
particular definition of “bandwidth.” 

Shortly thereafter, in 1928, Hartley 41 reasoned that Nyquist’s result, 
when coupled with a limitation on the accuracy of signal reception, 
implied a restriction on the amount of data that can be communicated 
reliably over a physical channel. Hartley’s argument may be summarized 
as follows. If we assume that (1) the amplitude of a transmitted pulse is 
confined to the voltage range [—A, A] and (2) the receiver can estimate a 
transmitted amplitude reliably only to an accuracy of ±A volts, then, as 
illustrated in Fig. 1.1, the maximum number of pulse amplitudes distin- 
guishable at the receiver is (1 + A/ A). It follows that a sequence of kTW 
resolvable pulses, each of which can assume any one of (1 + Aj A) ampli- 
tudes, affords a total of 




kT IF 


(U) 


distinguishable received signals. 

As illustrated in Fig. 1.2, an equal number of distinguishable trans- 
mitter pulse sequences can be constructed and used to communicate one 
of M different possible messages reliably in time T. The procedure, indi- 
cated in Fig. 1.3, is to associate each distinguishable transmitter sequence 
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Figure 1.1 Distinguishable receiver amplitudes. Hartley considered received pulse 
amplitudes to be distinguishable only if they lie in different zones of width 2A Thus 
pulses a and c are distinguishable but a and b are not. For the case shown AIA = 4 
and there are five distinguishable zones. 

uniquely with one of the M messages, say m 0 , m b . . . , m M _ x , and to 
transmit the kth sequence if and only if the actual transmitter input is m k . 
Hartley concluded that if we attempt to increase M above the value 
specified in Eq. 1.1 by transmitting more than kTW pulses or by using 
pulse amplitude levels less than 2 A volts apart, the signaling strategy of 
Fig. 1.2 will break down. The receiver no longer distinguishes reliably 
between all signal sequences, and communication becomes unsatisfactory. 

Hartley s formulation exhibits a simple but somewhat inexact inter- 
relation among the time interval T, the channel bandwidth W, the maxi- 
mum signal magnitude A, the receiver accuracy A, and the allowable 
number M of message alternatives. Communication theory is intimately 
concerned with the determination of more precise interrelations of this 
sort. It is also concerned with maximizing the distinguishability of the 
transmitted message by appropriate signal processing (waveform design) 
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Figure 1.2 Distinguishable transmitter sequences. Two sequences of received pulses 
are distinguishable if one or more of their constituent pulse amplitudes are distinguish- 
able. The two transmitter sequences s,(t) and sj(t) illustrated above lead to distinguish- 
able receiver sequences whenever each pulse amplitude is altered by less than ±A 
during propagation and are therefore called distinguishable. We may construct M — 
(1 -f- A/A) kTiy such sequences by allowing each pulse to assume any one of the 
(I + /4/A) amplitudes indicated by the dashed lines. (For the case shown, A / A = 4, 
kTW = 6, hence M = 15,625.) 
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Figure 1.3 Discrete message transmission. There are M messages {«,}, and M corre- 
sponding signal sequences {s,(t)}- The transmitted signal, s(t), is whenever m is ni k . 

at the transmitter, with processing the. received signal to determine the 
transmitted message as accurately as possible (or as accurately as is 
justified economically), and with the complexity of implementing the 
transmitter and receiver signal processors. 

Randomness 

The essence of communication is randomness. If a listener knew in 
advance exactly what a speaker would say, and with what intonation he 
would say it, there would be no need to listen! Thus communication 
theory involves the assumption that the transmitter is connected to a 
random source, the output of which the receiver cannot with certainty 
predict. Otherwise, no communication problem exists. 

Although less obvious, it is also true that there is no communication 
problem unless the transmitted signal is disturbed during propagation or 
reception in a random way. By way of example, consider communicating 
the content of a book chosen at random from the Library of Congress 
and assume that the alphabet (plus punctuation and numerals) comprises 
64 symbols. To each symbol we can assign a six-digit binary number; for 


instance, 

a: 

0 

0 

0 

0 

0 

0 

b : 

0 

0 

0 

0 

0 

1 

c: 

0 

0 

0 

0 

1 

0 

9: 

1 

1 

1 

1 

1 

1 . 


The total content of the selected volume can then be written as a single 
long sequence of the binary symbols 0, 1 by allotting the first six digits 
of the sequence to the first letter of the volume, the next six digits to the 
second letter, and so forth. Finally, we may interpret the resulting binary 
sequence as a binary number between zero and one by placing a binary 
point at the beginning of the sequence, as shown in Fig. 1.4 a. 

We now observe that the entire volume can be designated by a single 
Nyquist pulse. As indicated in Fig. \Ab, we need only adjust the pulse 
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First letter Second letter Last letter 

/ ' \ / * \ / * \ 

.1 0 1 101 0000 10 10001 1 

* ai 02 «3 aN 

Binary point . . 


Amplitude 



Figure 1.4 (a) A message represented by a binary number. If successive binary 
symbols are denoted a u o a , . . . , a N , as shown, the value of the number is a, • 2" 1 + 

a 2 ■ 2~* H + a N • 2”‘ v . Here N is six times the total number of letters constituting 

the message. ( b ) A pulse with amplitude equal to the number representing one of 2" v 
messages. 

amplitude to equal the value of the binary number in Fig. 1.4a. Indeed, 
if the transmitted amplitude could be precisely determined by a receiver, 
not only a single volume but also the contents of the whole Library of 
Congress could be communicated in this way by means of a single ampli- 
tude value. The procedure, however, is clearly preposterous. Small dis- 
turbances, called noise, always preclude either transmitting or receiving 
with such incredible precision. In Hartley’s result, Eq. 1.1, the precision 
limitation implied by noise is incorporated in the accuracy, or quantization, 
parameter A. 

Probabilistic Formulation of the Communication Problem 

Although it recognizes the importance of noise, Hartley’s conclusion 
does not account for the empirical fact that any receiver will occasionally 
estimate a transmitted amplitude incorrectly, regardless of how large a 
quantization grain A is designed into the communication system. The 
next major advance in communication theory occurred in 1942, when 
Norbert Wiener 84 ingeniously circumvented this difficulty by adopting a 
totally different point of view. His approach included the situation 
illustrated in Fig. 1.5, in which the received signal r(t) is the sum of a 
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Figure 1.5 A communication problem considered by Wiener. The optimum filter 
minimizes the average value of the squared error [m(t) — m(0F. in which m(t) denotes 
the receiver’s estimate of m(t). 

desired random message waveform m(t) and an unwanted noise waveform 
n(t). Wiener then solved the optimum linear filtering problem; that is, 
he determined the linear filter whose output is the best mean-square 
approximation to m{t) when r(t) is the filter input. 

The use of the word “optimum” entered the world of communication 
engineering primarily in the pioneering work of Wiener. Many problems, 
however, remained unresolved. In particular, a message waveform m(t)— 
i such as speech — is not often transmitted directly; instead, m(t) is used to 

modulate (control some parameter of) the actual transmitted signal, say 
i s(t), as indicated in Fig. 1.6. Questions related to optimizing the trans- 

formation m(t ) —> s(t ) and to processing the received signal when this 
transformation is nonlinear could not be answered until Rice 69 developed a 
satisfactory representation of the effects of noise, in 1944. 

Kotel’nikov 51 addressed himself to these questions in 1 947. He succeeded 
not only in analyzing all modulation systems then in existence but also in 
stating certain fundamental and unavoidable performance limitations on 
j all possible future modulation and receiver systems. A significant portion 

of this book is based on KoteFnikov’s methods and results, 
j Communication theory reached maturity in the work of Shannon 75 in 

1948. Previously the intuitively apparent but erroneous concept that noise 
placed an inescapable restriction on the accuracy of communication had 
been universally accepted. In sharp contradiction Shannon proved that 
the transmission effects of noise, constrained bandwidth, and restricted 
|| signal magnitude can be incorporated into a parameter, C, called the 



Figure 1.6 The transmitter transforms m(/) into a signal s(t) suitable for propagation 
over the channel. 
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channel capacity. The significance of channel capacity is this: provided 
the number M of message alternatives grows as a function of the signal 
duration T slowly enough so that 

M < 2 CT , (1.2a) 

then arbitrarily high communication accuracy can be obtained in principle 
by choosing T large enough ; that is, by using signals that are sufficiently 
long. Conversely, Shannon also showed that reliable communication is 
not possible— regardless of the signal-processing schemes adopted at 
transmitter and receiver — whenever 

M > 2 ct . (1.2b) 

A major fraction of communication research since 1948 and a corre- 
sponding fraction of this text have been devoted to extending these results 
and determining how they may be approximated in engineering practice. 

1.2 PLAN OF THE BOOK 

A second principal result of Shannon’s work has been the recognition 
that communication is fundamentally a discrete process. By this we mean 
that a receiver can meaningfully distinguish between only a finite num- 
ber of message alternatives in a finite time. Although proof that the 
communication process is discrete — apparently even when the ultimate 
transmitter and receiver are human beings — exceeds the scope of an 
introductory text, an appreciation of this point of view can be gained 
by considering how well an accomplished novelist uses a finite alphabet 
to convey not only meaning but emotion. Indeed, the primary difficulty 
encountered in extending discrete analysis to voice communication is 
simply that no adequate criterion has thus far been discovered for describ- 
ing the subjective equivalence to a listener of many quite different speech 
waveforms. 

Once the fact that a receiver can distinguish meaningfully between only 
a finite number of message alternatives has been accepted, it follows that 
no significant loss in communication performance is entailed in restricting 
the transmitter to sending one of a finite set of signals. The block diagram 
of such a communication system is illustrated in Fig. 1.7. As in Fig. 1.3, 
the source output m is assumed to be generated at random from a set of 
M possible discrete messages, {mf, i = 0, 1, . . . , M— 1. Each message 
is associated with a corresponding signal waveform, sft) for all i, 
and the transmitter sends sft) whenever m is m k . The transmitted signal 
then propagates through the channel, and a corrupted version r(t) is 
delivered to the receiver input. 
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Figure 1.7 Block diagram of a discrete communication system. 


The task of the receiver is to produce ah estimate, m, of the message 
generated by the source. It does this by comparing r(t) against each 
member of the set of all M signal alternatives {^(0), replicas of which we 
presume are stored in the receiver. 

The receiver structure shown in Fig. 1 .7 is quite general. The receiver 
consists of a linear “front end” that compensates for attenuation during 
propagation, a set of M detectors, and a decision element. Each detector 
performs one of the comparison operations. In particular, the ith detector 
compares the received waveform r(t) with the rth signal waveform sft) and 
produces a voltage value, say, u i7 that is a measure of their similarity. The 
decision element then determines m on the basis of these {«,}, i = 0, 
1, . . . , M— 1. For certain choices of the {s*(0} a single detector may 
suffice, in which case the receiver diagram reduces to that shown in Fig. 1 .8. 

The chapters that follow are organized around these block diagrams. 
We begin by considering the point labeled a in Fig. 1.8 and by assuming 
that the entire communication system, with the exception of the decision 
element, has already been designed. In Chapter 2 we introduce the 
mathematical tool — probability theory — that is necessary for determining 
how best to design this element. 

Chapter 3 is devoted to extending the concepts of probability theory to 
the study of random waveforms. In Chapter 4 we consider first the point 


Receiver 



Figure 1.8 In certain cases the receiver of Figure 1.7 may be reduced to the form 
shown above. 
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labeled b in Fig. 1.7 and exploit the results of Chapters 2 and 3 to deter- 
mine the optimum detector and decision operations when transmission is 
disturbed by white Gaussian noise. Chapter 4 continues with a discussion 
of signal design (point c in Fig. 1.7) and concludes with an evaluation of 
the over-all system when the {j f (0} are chosen to yield the best possible 
performance. 

In Chapter 5 we study the effects introduced by constraints on the 
allowable transmitter power and the available channel bandwidth (cor- 
responding to point d in Fig. 1.7). In particular, bounds are established 
on the best attainable performance, and classes of signals that essentially 
attain these bounds are described. Questions of transmitter and receiver 
implementation are considered in Chapter 6, and the over-all theory is 
discussed in relation to a telephone line data communication experiment. 

Chapter 7 is concerned with the extension of the preceding results to 
bandpass channels, to filtered signals, and to nonwhite noise. Certain 
effects of random scattering during propagation are described and 
evaluated. 

Finally, in Chapter 8 we consider the case in which the output of the 
random source is a continuous waveform, such as speech, rather than one 
of a finite set of discrete messages. Conventional modulation systems are 
evaluated, and their performance is related to that afforded by discrete 
signaling. The chapter concludes with a determination of the fundamental 
limitations of continuous modulation and a discussion of the inherent 
advantage obtainable in a discrete approach to the communication 
problem. 

1.3 THE ROLE OF COMMUNICATION THEORY 

It is interesting that ingenious experimentation has often led historically 
to advances in communication technology far antedating real under- 
standing of the principles involved. For example, frequency modulation 
(abbreviated FM) came into widespread use soon after Armstrong 3 first 
appreciated its noise-suppression capability in 1936, even though to this 
day some aspects of FM noise behavior remain puzzling and are the 
subject of active research. Moreover, the basic idea of frequency modu- 
lation had been devised long before in a misguided attempt to conserve 
transmission bandwidth and had lain essentially dormant subsequent to 
Carson’s disproof 17 of such a characteristic in 1922. In the past the role 
of communication theory frequently has been to explain rather than to 
foretell. 

On the other hand, the basic conceptual aspects of communication are 
now on solid ground, and an extensive body of methodology and results 
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has been accumulated. Although innumerable strides of invention, both 
theoretical and Experimental, remain to be taken, it appears increasingly 
likely that future advances will germinate within the framework of com- 
munication theory. Even when a problem is best approached experi- 
mentally, appreciation of the principles underlying communication 
engineering will provide insight vital to guiding the experiments to be 
performed. 
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Probability Theory 


In our discussion of communication thus far we have emphasized the 
central role played by the concept of “randomness.” If the ultimate 
receiver knew in advance the message output from the originating source, 
there would be no need to communicate; and if the propagation of electro- 
magnetic signals were not disturbed by nature, to communicate the 
message would be no problem. The word “random” means “unpredict- . 

able” ; on the basis of what we know about the past of a phenomenon, we 
are unable to predict its future in detail. A considerable body of mathe- . 

matics (calculus, for example) has been developed to treat causal phe- 
nomena occurring in the real world. Similarly, mathematical models have 
been developed that are useful in the study of real-world random phenom- 
ena. The objective of Chapters 2 and 3 is to present the mathematical 
background essential to our further study of communication. J 

2.1 RANDOMNESS IN THE REAL WORLD ^ 

Our inability to predict the detailed future of a random phenomenon 
may arise either from ignorance or laziness : to the limit of our knowledge, 
the laws governing a progression of events may be fundamentally random .£ 

(as in quantum physics); or they may be so complicated and involve such 
critical dependence on initial conditions (as in coin tossing) that we deem 
it unprofitable to undertake a detailed analysis. 

A pertinent example of randomness is the transmission of radio waves 
through the ionosphere, illustrated in Fig. 2.1. Radio waves at certain ; | 

frequencies are refracted as they pass through the ionized gas that con- 
stitutes the ionosphere. The degree of refraction depends on the detailed 
structure of the ionosphere, which depends, in turn, on the amount of 
ionizing solar radiation, the incidence and velocities of meteors, and on 
many other factors. 

The voltage at the terminals of the receiving antenna is the resultant 
of a number of waves traveling over a variety of different paths. The 



Figure 2.1 Refraction of radio waves by the ionosphere. 


attenuation and propagation delay vary from path to path at any given 
instant of time and vary with time for any given path. The causes of these 
variations are far too complex to be calculable in detail. Thus the re- 
ceiving antenna terminal voltage varies in a manner unpredictable in 
detail. We say it varies randomly. 

Although we cannot predict exactly what the antenna output voltage 
will be, we find experimentally that certain average properties do exhibit a 
reasonable regularity. The received power averaged over seconds does not 
vary greatly over minutes; the received power averaged over a month does 
not differ greatly from that averaged over another month characterized by 
the same solar activity. 

This statistical regularity of averages is an experimentally verifiable 
phenomenon in many different situations involving randomly varying 
quantities. We are therefore motivated to construct a mathematical model 
adequate for the study of such phenomena. This is the domain of the 
mathematical field of probability and statistics. 

Random Experiments 

To avoid confusion, we introduce the following terminology. By an 
experiment in the real world we mean a measurement procedure in which 
all conditions are predetermined to the limit of our ability or interest. We 
use the word trial to mean the making of the measurement. By a sequence 
of N independent trials of an experiment we mean a set of N measurements, 
in the performance of each of which the discernible conditions are the 
same. 

An experiment is called random when the conditions of the measurement 
are not predetermined with sufficient accuracy and completeness to permit 
a precise prediction of the result of a trial. Whether an experiment should 
be considered random depends on the precision with which we wish to 
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distinguish between possible outcomes. If we desire (or are able) to look 
closely enough, in some sense any experiment is random. 

The discussion above leads, us to distinguish in connection with an 
experiment between the terms outcome and result. By different outcomes 
we mean outcomes that are separately identifiable in an ultimate sense; 
in general, the set of outcomes in any real-world experiment is infinite. By 
different results we mean sets of outcomes between which we choose to 
distinguish. Thus the outcomes that are classified into a result -share-some 
common identifiable attribute. For example, a result in our propagation 
experiment might be that the received power at the antenna terminals, 
averaged over T sec, is between 10 and 15 ^ w. Such a result clearly 
embraces an infinitum of different possible received waveforms, or 
outcomes. 

Relative Frequencies 

We can now discuss more precisely what we mean by statistical regu- 
larity. Let A denote one of the possible results of some experiment and 
consider a sequence of N independent trials. Denote by N(A) the number 
of times that result A occurs. The fraction 

MA) = (2.1) 

is called the relative frequency of the result A. Clearly, 

0 < MA) < 1 . ( 2 . 2 ) 

In Fig. 2.2 we plot MA) versus N for a typical sequence of trials in a 
coin-tossing experiment, where A denotes the result “Heads.” We observe 
that the relative frequency fluctuates wildly for small N but eventually 
settles down in the vicinity of £. This stabilization of the average incidence 
of Heads in a large sequence of repeated trials is a simple example of 
statistical regularity. In fact, we are so imbued with the notion that this 
stability is proper that were it not in evidence we would immediately 
suspect either the coin or the tosser. We feel intuitively that statistical 
regularity is a fundamental attribute of nature. 

We often denote different results of an experiment by different sub- 
scripts; for instance A lt A 2 , . . . , A M . Results that cannot happen 
simultaneously in a given trial are called mutually exclusive. As a trivial 
example, in a coin toss the results Heads (say A{) and Tails (say A 2 ) are 
mutually exclusive. For mutually exclusive results it is clear that the 


f N (A) 



Figure 2.2 Relative frequency in coin tossing. ( N is plotted on a logarithmic scale.) 

occurrence of the result “either A t or A” satisfies the equality 
NiAtOTAJ^NiAd + NiAJ; 

hence 

M A i or A i) = /n(^<) + M a j)- (2.3) 

Another example is tossing a die, with A t denoting the result that the z'th 
face shows. The result “odd face shows” is therefore the result “Ai or 
A a or A s .” Clearly, 

M A t or A s or A <i) =M A i) +M A a) +/n(A). 

For a fair die we expect the relative frequency of each A i to stabilize about 
Thus we expect the relative frequency of the result “odd face shows” 
to stabilize at 

2.2 MATHEMATICAL MODEL OF PROBABILITY THEORY 

Mathematical models prove useful in predicting the results of experi- 
ments in the real world when two conditions are met. First, the pertinent 
physical entities and their properties must be reflected in the model. 
Second, the properties of the model must be mathematically consistent 
and permissive of analysis. 

We have seen that real-world random experiments involve three 
pertinent entities: 

1. The set of all possible experimental outcomes. 

2. The grouping of these outcomes into classes, called results, between 
which we wish to distinguish. 
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3. The relative frequencies with which these classes occur in a long 
sequence of independent trials of the experiment. 

In the mathematical model of probability theory the corresponding 
abstractions are called: 

1. The sample space. 

2. The set of events. 

3. The probability measure defined on these events. 

We begin our discussion by defining these three mathematical entities. 
We then develop our model by assigning to them mathematically con- 
sistent properties that reflect constraints in the real world. We conclude 
with a series of examples that develops further the correspondence 
between our abstract entities and their real-world correlatives. 



Figure 2.3 A sample space. Each graph A, is associated with the sample point 0 ) it 
i - 1, 2, 3, 4. 

Fundamental Definitions 

Sample space: a collection of objects. The collection is generally re- 
ferred to by the symbol Cl. An object- in Cl is called a sample point and 
denoted co. As examples, Cl might consist of 

the set of 4 graphs shown in Fig. 2.3, 
several points on the real line, 
the closed interval [0, 1 ] of the real line, 
all points in a plane, 

all time functions f(t) defined for — co < t < co. 
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The sample space Cl corresponds to the set of all possible outcomes of a 
real world experiment; each outcome, in turn, corresponds to a sample 
point. 

Event: a set of sample points. We usually label events by capital letters, 

such as A, B, . . . , or A lt An event is concisely defined by the 

expression 

A = {co: some condition on co is satisfied}, (2.4) 

which is read “the event A is the set of all co such that some condition on co 

is satisfied.”! For example, if Cl is the x,y plane and p 2 = x 2 + y 2 , a pos- 
sible event is A = {co: p < 1}. Then A is the set of all points interior to a 
unit circle centered on the origin. Similarly, if £2 is the set of all time func- 
tions, a possible event A is the subset of all time functions such that 

2< f\t) dt< 2.5. 

J — 00 

Since the entire sample space is a set of sample points. Cl itself is always an 
event. 

Events in the mathematical model correspond to results in the real 
world. 

Probability measure: an assignment of real numbers to the events 
defined on £2. The probability of an event A is denoted P [A]. The con- 
ditions that the assignment must satisfy will be discussed subsequently. 

Example 1. If the sample space Cl is the set of 4 graphs shown in Fig. 2.3 
and we define the event A t to be the it h graph (sample point), a possible 
probability assignment is 

PM = h 
PM = PM = b 
PM = o. 

Example 2. If Cl is the real line segment 0 < w < 1 and we define the 
events A t = {co: 0 < co < /}, / < 1, a possible probability assignment is 

PM - /• 

Example 3. If Cl is the set of all time functions {/(/)} and we define 
the events A x = {co: 0 </(0) < x], a possible probability assignment is 

PM “I-*”"- 

The probability assigned to an event corresponds to that value at which 
we expect the relative frequency of the associated result to stabilize in a 
long sequence of independent real-world experimental trials, 
f Throughout this text, braces are used, to denote a set: for example, {A ( } denotes the 
collection of all A it i = 1,2,.... 
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Ancillary Definitions 

The definition of a sample space Cl and events such as A, B, . . . implies 
the existence of certain other identifiable sets of points. 

1 . The complement of A, denoted A c , is the event containing all points 
in Q but not in A. 

A c ^ {w : co not in A}. (2.5a) 

2. The union of A and B, denoted A U B, is the event containing all 
points either in A or B or both. 

A u B = {co: co in A or B or both}. (2.5b) 

3. The intersection of A and B , denoted AB, is the event containing all 
points in both A and B, f 

AB = {co: co in both A and B}. (2.5c) 

4. The event containing no sample points is called the null event, 
denoted 0 . Thus Cl c = 0. 

5. Two events A and B are called disjoint if they contain no common 
point, that is, if AB = 0 . 

The relations between the operations complementation, union, and 
intersection are easily visualized geometrically. In Fig. 2.4 the events 
Cl, A, B, and C are represented by sets of points lying within labeled 
closed contours. Such drawings are called Venn diagrams. From Fig. 2.4 


it is immediately obvious that 

A u A c = Cl, (2.6a) 

AA C = 0 = Cl c , (2.6b) 

AC1 = A. (2.6c) 

Moreover, further study of Fig. 2.4 reveals that 

(. AB) C = A C UB C (2.7a) 

(A U B) c = A C B C (2.7b) 

A u B = ( AB C ) u ( AB ) U ( A C B ), (2.7c) 


where the three events on the right-hand side of Eq. 2.7c are disjoint. 

f Intersection is also denoted A n B. We use this notation only when necessary 
for clarity. 
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Figure 2.4 Venn diagrams. 


A moment’s reflection makes it clear that the union and intersection 
operations are commutative, associative, and distributive. That is to say, 

A u B = B U A 

commutative 

AB = BA 

[Av(BvC) = (AOB)uC 

associative 

A(BC) = (AB)C 
A(B UC) = (AB) U (AC) 

distributive 

A U (BC) = (A U B)(A \j C). 

Properties 

In a long sequence of N independent trials of a random experiment in 
the real world the results {A t } and the observed frequencies (/ N (^)} with 
which these results occur meet certain conditions: 

1 . The relative frequency / N (^ f ) of every result satisfies the inequalities 

0 <fu(Ad <1- 

2. Every trial of an experiment has an outcome. 

3. If two results A and B are mutually exclusive, then fJIA or 5) = 
UA)+MB). 
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Since our objective is to use probability theory to predict the results 
of real-world random experiments, it is reasonable that similar con- 
ditions should be imposed on corresponding entities in our mathematical 
model. We therefore restrict our assignment of probability measure to 
have the following properties. 

I. To every event a unique number PMJ is assigned such that 

0 < PMJ < L 

II. P[D] = 1. 

III. If AB = 0, PM u£] = PM1 + P[*]. 

These properties, motivated from real-world considerations, are all that 
we require in our present discussion of the mathematical model. They are 
also adequate for a formal, axiomatic development of probability theory 
whenever the totality of events on O— defined to include every complement, 
union, and intersection of events— is finite. (When this totality of events is 
infinite, it is necessary in an axiomatic development to specify carefully the 
collection of events defined on O. and to extend property III to include 
infinite unions of disjoint events. These modifications extend the scope of 
the theorems derivable from the axioms.) 

Properties I to III have several immediate implications. Since A C A — 
0 , II and III imply that 

P[A] + P[A c ]=l 

or P [A c ] = 1 - PM]. (2.8a) 


In particular, when A = O, 

P[0] = 1 - P[H] - 0. (2.8b) 

Also, since the events (AB), ( AB C ), and ( A C B ) are disjoint, property III 


implies that 


PM] = P [AB] + P [AB Z ] 


P [5] = P [AB] + PM C 5]; 


hence, from Eq. 2.7c, we have 

PM U5] = PM£ C ] + PM-S3 + P [A C B] 

= PM] + PM] - PMS]> (2-9) 

PM U5]<PM1 + PM]. ( 2 -i0) 


Probability Systems 

A sample space, a set of events, and a probability assignment to the 
events together constitute a probability system. The probability assign- 
ment must be complete, in the sense that if events A and B are assigned 
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probabilities a probability must also be assigned to the intersection AB 
and (by Eq. 2.9) to the union A u B. We now consider two examples that 
illustrate the assignment of probabilities. 

Finite sample spaces. A probability system in which D has only a 
finite number of points, say k, is especially simple. The maximum number 
of distinct events that can be defined on such a sample space is exactly 2 k , 
since each of the k points may or may not be included in any particular 
event. For instance, if O consists of the three points ft) l5 o) 2 , <w 3 , the most 
general set of events can be denoted by the binary sequences 

A 0 = (000) = 0 A, = (100) 

A 1 = (001) A 5 = (101) 

A z — (010) A e — (110) 

A s = (011) A 7 = (111) = £2; 

the convention is to set the ith digit of a A:-digit sequence equal to 1 or 0, 
the choice depending on whether or not the point co* is included in the 
event. 

The most general probability assignment for the 2* possible events can 
be obtained by associating with each point co* in Q a non-negative number 
Pi such that 

IP,- 1- <2.1ia) 

!=1 

The probability of an event A is then taken to be the sum of the P f ’s of the 
points it contains. We write 

PMJ -2^ (2.11b) 

/ 

where / denotes the set of subscripts of sample points constituting A. For 
example, the probability of the event A 3 defined in the preceding para- 
graph is PM 3 ] = P^ + Pz- It is evident that probabilities assigned in this 
way meet the conditions of properties I— III. 

Real-line sample spaces. In sample spaces that contain an infinite num- 
ber of sample points, events and probabilities may be assigned with 
considerably more freedom than in finite sample spaces. Consider, for 
example, the case in which the sample space is the real-line interval 
£2 = (co: 0 < oi < 1}. A possible probability system, which we have 
already briefly encountered, results when events are intervals of this line 
segment, plus unions, intersections, and complements of such intervals. 
The intervals may include both, one, or either of the end points. A 
possible probability assignment is then one in which the probability of an 
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event is the sum of the lengths of the disjoint intervals that constitute the 
event. For example, if the event A illustrated in Fig. 2.5 is defined as 

A = {to: (% < co < a 2 ) or (a 3 < co < a 4 ) or 

(% < o> < a 6 ) or (a 7 < co < tf 8 )} 

with 

0 < Oi < a i+ 1 < 1 ; for all 

then 

PM] — («2 - «l) + («4 - «s) + («fl - «s) + ( a 8 ~ «?)• 



Figure 2.5 An event on the sample space = {co: 0 < co < 1). The event A is the 
union of the shaded intervals. 


A convenient way to describe the probability system considered above 
is to write 

PM] =J/(a>) dm, (2.12a) 

i 

where the integration is over the intervals constituting the event A and 



0 < co < 1 
elsewhere. 


(2.12b) 


For the event A defined above and the probability assignment, of 
Eq'. 2.12, PM] is given by the length of the shaded area in Fig. 2.5. It is 
clear from the figure that this probability system satisfies properties I, II, 
and III. It is also clear, in contradistinction to the case of a finite sample 
space, that the most general probability assignment to events is not built 
up from probabilities assigned to individual sample points. The proba- 
bility assigned to any point to by Eq. 2.12 is zero; obviously this conveys no 
knowledge about the probability assigned to an interval. 

The probability assigned to an interval cannot be an arbit rary fun ction 
if pro perties I to III are to hold. For example, if the probability of an 
interval A were chosen to be the square of its length, properties I and II 
would hold for the unit-interval sample space, but property III would not. 
In particular, if 

A = {co: a < co < c), 




Alternatively, however, we can write 

A ~ (to : a co <1 bj U (to : b < to ^ c}. 

Since the two events on the right-hand side are disjoint, property III states 
that 

Pj>i] ={b- af + (c - b)\ 


which is inconsistent with Eq. 2.13. Using two different methods of 
calculation, we get two different answers for P[v4], and therefore the 
probability assignment is invalid. In assigning probabilities we must be 
careful to preclude the possibility of inconsistency. 

A general probability assignment to intervals on [— co, co] which is 
always valid is 

Pf4] =J/(a>) d(o, (2.14a) 

i 

where /(co) can be any integrable non-negative function such that 


/(co) dto ~ 1 


(2.14b) 


and 7 is the set of sample points constituting A. Equation 2.14a is analo- 
gous to the summation of Eq. 2.1 lb for finite sample spaces. Examples of 
appropriate functions /(to) are shown in Fig. 2.6. 




1 

( 0; elsewhere 


Figure 2.6 Examples of functions 
for probability assignment to real- 
line sample spaces. 

0 

1 


we would have 


PM] = (c - a) 2 . 


(2.13) 


(c) 
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Relation of the Model to the Real World 


The function of a mathematical model in engineering is to permit the 
prediction, by calculation, of observable results in the real world. The 
utility of probability theory derives from the fact that it enables us to make 
precise mathematical statements that mirror the statistical regularity 
observed in nature. Before we can discuss this mirroring in a meaningful 
way, however, it is necessary to construct a mathematical model for a 
compound experiment ; that is, an experiment which itself consists of a 
sequence of TV independent trials of a simpler experiment. To do so we 
first consider relative frequency in more detail. Our objective is to 
discover how to assign probabilities meaningfully in the mathematical 
model of a compound experiment. 

Consider the compound experiment that consists of two independent 
trials of a simple experiment, one result of which is A. In the compound 
experiment a set of possible results consists of the four sequences of 
observations (A, A), (A, B), ( B , A), and ( B , B), where the first entry 
denotes the result of the first trial, the second entry denotes the result of 
the second, and B = A c . For TV .= 10 independent repetitions of the 
compound experiment, a typical sequence of results might be 

Trial Number Result 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 


B,B 

A, B ✓ 

B, A 
B, A 
B,B 

A, A V 

A, B v 

B, B 
B,B 

A, B v 


The relative frequency of a particular compound result, say (A, B), can 
be calculated in either of the two following ways. The direct method is to 
count the number of occurrences, N(A, B) and divide by TV. 


UA, B) 


N(A, B ) 
N 


An indirect method is to check (as shown) all results that begin with A 
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and to calculate the fraction of checked results that end with B. We call 
this fraction the “conditional relative frequency of B on the second trial, 
given A on the first trial,” and denote it f M (B I A). For our examnle 
y N (i< I A) = f . Then y N (/l, B) is also given precisely by the alternative 
expression 

MA,B)=f N (A)f N (B\A), (2.15a) 

where f N (A) is the relative frequency with which A occurs on the first 
trial. In our example f N (A ) = ,- 0 . Both methods of calculation yield 
MA, B ) = 

Although these manipulations appear trivial, the formulation in terms 
of conditional relative frequency permits us to exploit the fact that the 
trials in each sequence-of-two are independent. Independence implies 
that the result of one trial of the simple experiment does not affect the 
result of another. Thus, when TV is sufficiently large, we usually observe 
that both/ N (S j A) and f N (B) stabilize at the same numerical value. In 
a long sequence of independent repetitions of pairs of trials we therefore 
expect that the following approximation to Eq. 2.15a will be valid: 

MA, B) * UA)MB). (2. 1 5b) 

For instance, in coin tossing we anticipate that the over-all frequency of 
Tails and the frequency with which Tails follows Heads will both be near 
h Therefore we expect that f N (H, T) will be near J for large TV. 

Similarly, if a compound experiment consists of M independent trials 
of a simple experiment, the relative frequency with which the result is any 
particular sequence such as (A, B, B, . . . , A) is usually observed to 
approximate the Af-term product of the relative frequencies of the result’s 
constituents: 

f N (A, B,B,...,A) - -MA). (2.15c) 

With this background, we can discuss the problem of determining a 
mathematical model to represent a sequence of M trials of a real-world 
experiment. Assume that we have already determined a probability system 
that adequately represents a single trial of the experiment and that our only 
interests are in some particular event A having probability P[T] and in 
the complementary event B = A c . We now construct a new probability 
system appropriate for modeling the sequence of M independent trials. 
The sample space of the new system consists of 2 M points, each of which 
stands for one of the possible sequences of length M constructible from 
A and B. For example, if M — 3, there are eight sample points. 

We are guided in assigning probabilities to these 2 M points by a desire 
that probabilities should act as relative frequencies. Accordingly, in our 
new system we mirror Eq. 2.15c and assign to each sequence ( sample point) 

©iblioteca 
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a probability equal to the product of the probabilities of its constituents.- 
Thus, if P[^3 = p, which implies P[5] = 1 - p, we assign events and 
probabilities to sample points as shown in Table 2.1 for M = 3. In order 
ultimately to establish a tie between the repeated physical experiment and 
our mathematical model we also associate with each sample point a 
number m(A) equal to the fractional number of times A occurs in the 
corresponding sequence : 

m (A) = ^ , (2.16a) 

M 

where M(A) is the number of A's in a sequence. 


Table 2.1 Probability assignment, M — 3 


Sample Point 

Event 

Probability 

0>i 

AAA 

P 3 

C0 2 

AAB 

pX i -p) 

«> 3 

ABA 

pX 1 -p) 

"4 

BAA 

pX 1 -p) 

W 5 

BBA 

p ( 1 -p¥ 

0> 6 

BAB 

p ( 1 -p) 2 

ft), 

ABB 

p( 1 -p) 2 

0> 8 

BBB 

(i - P y 


For this example the probability of the event m(A ) = f is 
P[m(A) = I] = P[{o>2> " 4 }] = 2 P K1 — 3 p 2 ( 1 “ P)' 

i= 2 

For general M the probability that m(A) = kjM is 


m(A) = -1 = ( M W - 0 < k < U, (2.16b) 

M \k/ 


Mi 

k\ (M - k)\ 


(2.16c) 


To show this, we first note that probability p\\ - p) M ~ k is assigned to 
each sequence that contains exactly k T’s. It is well known that there are 


( | distinct sequences of k T’s and M—k B' s. Since each distinct 

\kl _ [M\ 

sequence corresponds to a sample point, there must be I ^ I sample 


points for which m(A) = kjM, each having probability p\ 1 p) M 
Equation 2.16b then follows immediately from property III. 
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• The probability assignment of Eq. 2.16b is called the binomial dis- 
tribution, and the j are the binomial coefficients. We check that the 

binomial probabilities sum to unity (as they must to satisfy property II) 
by invoking the binomial theorem 

(a + b) M = f ( M )a k b M ~ k , (2.17) 

*=o \ k / 

and obtain 

m r ii 

lPm(A)=f =[P + (1-^ = 1. 

k~ 

Plots of P m{A) — are given in Fig. 2.7 for M = 16, 100, 400, and 
for p = 0.1 and 0.5. 

Our primary interest is in the probabilistic behavior of m{A) when M 
is large. For any small number €, let us consider the event 

(to : | m(A) - p\> e). 

Thus 

P[| m(A) - p\ > E ] = 2 (f)p M d ~ P) M (2.18a) 

where 

I = [kf<p-e or £>? + *}■ ( 2 ' 18b > 

[ M M } 

From Fig. 2.7 it is quite clear that, for any e, this probability tends to zero 
as M becomes large. Indeed, we shall see later that 

P[|m(al) - p\ > 6] < P( [~ P) (2.19a) 

Me 

and even more strongly that 

P[| m{A) - p| > e]< (2.19b) 

where a is a positive number independent of M. 

The number m(A) in the mathematical model of a compound experiment 
has been defined in a manner that makes it directly analogous to relative 
frequency; we have f N (A) = N(A)jN and m(A) = M(A)jM. Equation 
2.19 states, in addition, that m(A) exhibits properties that mirror those of 
relative frequency in nature : m(A) is close to the number P [A] with high 
probability when M is large, just as f^{A) almost always stabilizes close 
to this same number when N is large. Furthermore, the low-probability 
event that m(A) is very different from P[A] mirrors atypical results in the 
real world such as observing the relative frequency of Heads to be close 
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m(A) = k/M — >- 


m(A) = k/M >- 
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m(A) = k/M—> 
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Figure 2.7 Binomial probabilities and behavior of P[m04)]. The heavy 
line segment along the horizontal axis indicates the interval p ± 0.1 in 
(a), ( b ), (c) and the interval p ± 0.05 in (cl), (e), (/). 
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to unity in a long sequence of independent coin tosses. We say that such 
sequences are unlikely; the mathematical model says that they are 
improbable. We connect the model of probability theory with the real 
world by saying that we do not expect to observe a particular experimental 
result if it corresponds to an event of low probability. Thus in a long 
sequence of independent tr ials we expect t he measured relati ve fre q uenc y 
of a r esult to converge to the probability of the corresponding mathematical 
event. As in Newtonian mechamcsr~probaSTlity theory"Ts uffimateIy 
justified by the fact that it predicts — in this case, the relative frequency — 
successfully. 

Naturally, the success of a mathematical prediction depends not only 
on the rules used in calculating but also on the accuracy of the original 
numerical data. For instance, the mass of a mathematical body in 
mechanics must approximate the mass of the physical body. In application, 
probabilities are usually assigned initially to fairly simple events; then 
we proceed to calculate the probabilities of other, more complex, events. 
Care must be taken that the original assignment is realistic. For example, 
one objective of communication theory is the design of communication 
systems that operate over noisy channels with a minimum probability of 
error. Successful engineering results are obtained only if the mathematical 
model of the channel adequately reflects the true nature of the disturbance. 

In many cases study of the physics underlying a random phenomenon 
leads to a proper initial probability assignment; we shall see that tran- 
sistor and vacuum tube noise can be treated in this way. In some cases 
symmetry provides the starting point; for instance, it is reasonable to 
assign probability £ to Heads in coin tossing. In other cases we make 
recourse to the observation of -relative frequencies: life insurance rates 
are based on mortality experience tables. The unavoidable hazard here, 
of course, is that the observed frequencies may not be typical. In any 
event, the final test of validity is always whether or not predictions based 
on the original data are accurate enough to be useful. 


Conditional Probability 


In dealing with repeated trials of a physical experiment, we have 
introduced the concept of conditional relative frequency. It is convenient 
to introduce a corresponding concept into the mathematical model. 
Given any two events A and B, we define the conditional probability 
P [A I B\ of an event A as 


F[A \ B] *?m 

P[B] 


( 2 . 20 ) 
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whenever P[i?] # 0. When PHI is also nonzero, it follows that 

P [AB] = P [A | B ) PH] = P [B | A] PL-4]. (2.21) 

Since the intersection of B with itself is B, 

B[B\B]=l. (2.22) 

Conditional probabilities serve to narrow consideration to a subspace 
B of a sample space Cl. This can easily be visualized with the help of Fig. 
2.8, in which we show a sample space Cl on which several events HJ are 



defined. The shaded area to the left of the dotted line is another event B. 
It is useful to think of “conditioning” as a means of generating a new 
probability system from a given one: 

1. The new sample space, say Cl', is the original event B. 

2. The new events, say {A/}, are the original intersections {^5}. 

3. The new probabilities, {PH/]}> are the conditional probabilities 
m-| B]}. 

This probability assignment to Cl' satisfies the necessary properties. 

I. Since 0 < PHH] < P[5], we have 0 < PH/] < 1, 

II. By Eq. 2.22, F[Cl'] = PH | B] = 1. 


if 
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III. If A/ A/ = 0 , then {A t B) n (A^B) = 0 and 

B[A i 'UA/) = -p[A i B\JA J B\B] 

= PH*# AjB] 

P[B] 

_ P [A t B] + PHH1 
P [B] 

- PH/] + PH/]- 

Since conditional probabilities can be considered as ordinary probabilities 
defined on a new sample space, all statements and theorems about ordinary 
probabilities also hold true for conditional probabilities. In particular, 
if the set of intersections {A^} is disjointed if 

U (AtB) = B, (2.23a) 

alii 

then 

P[B] - £PH<B] = IP [B] P [A, | B] (2.23b) 

and 

1=^1*]. (2.24) 

Equation 2.23 is called the theorem of total probability. It corresponds 
to the geometrical axiom that the whole equals the sum of its parts. 

Statistical Independence 

As interpreted, cond itional probability is directly analogous to con- 
ditional relative freque ncy in a physical TxpenmentriFtol^'ses^ 
consider only the subset of possibilities that satisfies the condition. In a 
long sequence of independent experimental trials we therefore anticipate 
that a conditional relative frequency will stabilize at the corresponding 
conditional probability. 

If the joint probability]- of two events A and B satisfies 

PH, B] = PH] PH], (2.25a) 

or equivalently 

PH | B] = PH], (2.25b) 

we call the pair of events statistically independent. Equation 2.25 mirrors 
the corresponding approximate relationship for relative frequency with 
independent trials given by Eq. 2.15b. 

f The probability of the intersection AB of two events A and B is frequently written 
PH B ], instead of PUS], and referred to as the probability of the “joint event A and 
B ■” The notation arises naturally in modeling a sequence of trials, as in Table 2.1. 
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A set of k events {4,} is defined as statistically independent if and only 
if the probability of every intersection of k or fewer events equals the 
product of the probabilities of the constituents. Thus three events A, B, 



(a) independent events, 
a, b, c, d are satisfied ; 


(a) P[AB] = P[AjP[B] 

(b) P[AC]=sP[A]P[Cl 

(c) P[BC] = P[BiP[Cj 

(d) PJABC] = P[A] P[BJ P(C] 



(b) pairwise independent events, 
a, b, c satisfied, 
d not satisfied; 


(c) dependent events, 
a, b, d satisfied, 
c not satisfied. 


Figure 2.9 Independence and dependence of three events. 

c are statistically independent when 

P[A, B ] = P [A] P[sr 

P [A, C] = P[A] P[C] (2.26) 

P[5, C] = P [5] P[C], 

and P[A, B , C] = P [A] P[S] P [C]. (2.27) 

No three of these relations necessarily implies the fourth. If only Eq. 
2.26 is satisfied, we say that the events are pairwise independent. Pairwise 
independence does not imply complete independence. Various possibilities 
are given in Fig. 2.9. 
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An urn problem. The use of conditional probability often simplifies 
the assignment of probabilities to the joint occurrence of two events. 
Consider the urn problem, for example, in which we draw two balls at 
random from an urn containing one black and two red balls. When we 
draw two balls without replacement, the only possible outcome sequences 
are ( R , R), (R, B ),. and (B, R ). In the mathematical model we employ a 
subscript to denote the draw. For the first draw we set P[7?!] = f, 
p^] = i-. For the second draw we set 

P[i? 2 \Bi ] = i P[* 2 | = 1 

P[5 2 1 * x ] = i P [B 2 1 BA = 0, 

where the conditioning is on the result of the first draw. Thus 

P[*1, R 2 ] = P[Ri] P[i?2 I *i] = 

P[/?1, BA = P[^] P[B 2 \R 1 ] = b 

P[Bi> R*] = P[*i) P[^ 2 1 -8i3 = b 

A communication problem. A second, particularly germane, example of 
the utility of conditional probability is the following idealized communi- 
cation problem. Consider a mathematical model of a discrete communi- 
cation channel having M possible input messages {m^, 0 < j < Af — 1, 
and J possible output symbols {/•,}, 0 <y < /— 1. For purposes of 
this example the channel model may be completely described by a set of 
MJ conditional probabilities, {Pb^ | mj}, that specifies the probability 
of receiving each output conditioned on each input. For small values 
of MJ it is convenient to diagram these conditional probabilities (often 
called “transition probabilities” in a communication context) as shown in 
Fig. 2.10. 



P[r2|"Hl 

Figure 2.10 Transition probability diagram, M ~ 2, / = 3. 
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The model introduced above describes an actual communication system 
such as that diagrammed in Fig. 1 .7 when the entire system from source 
output to decision element input is considered, for purposes' of analysis, 
to be the “channel.” Specifically, the model results when there is only one 
detector (as in Fig. 1.8) and its output (point a) is constrained to assume 
one of a set of J discrete values. Under these conditions the design of the 
“receiver” amounts to specification of the decision element. 

Assume that we know the set of M probabilities {PM} with which 
the input messages occur. These probabilities are called the a priori 
message probabilities (meaning the probabilities before reception). Our 
problem is to specify a receiver that, on the basis of the symbol r } received, 
makes the optimum decision regarding which message m i was transmitted. 
We define optimum to mean that the probability of deciding correctly, 
denoted P[C], is maximum. In a long sequence of independent trans- 
missions we therefore expect the optimum receiver to decide correctly 
more often than any nonoptimum receiver. 

A single operation of the channel can be described on a sample space 
Q. comprising MJ sample points co t each labeled with one of the possible 
input-output pairs (m { , r 3 ). Probabilities are assigned to these points by 
the equation 

P [m„ r,\ = P[m,] P[r, | mj. (2.28a) 

We can then use Eq. 2.23b to calculate such quantities as 

M - 1 

PM=2P[m i ,r J ] (2.28b) 

and 

Ph| r J = ^. ( 2 . 28 c) 

An example of a typical probability system, with M = 2 and / = 3, is 

illustrated in Fig. 2. Id. 

Before a transmission the a priori probability that any particular input 
m . will be transmitted is P[mJ. After a transmission, given that r i is 
received, the probability that m t was transmitted is P [m t \ r } \, which is 
called the a posteriori probability. The effect of the transmission is to alter 
the probability of each possible input from its a priori to its a posteriori 
value. 

The specification of a receiver amounts to the specification of a mapping 
from the channel output space {/■,} onto the message input space (mj: 
each possible received symbol r, must be attributed to one and only one 
of the possible inputs. Let m(j) denote the particular input in the set 
M to which a receiver attributes r f . Then the conditional probability 
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Figure 2.1 1 Probability system for communication example. The probability of each 
input-output pair (m it r,) is represented by its area. 

P[C | r } ] of a correct decision, given that r 3 is received, is just the probability 
that m(j) was in fact transmitted. We write 

P[C | rj = ?[m(j) | r } ]. (2.29) 

Obviously, P(e | rff is maximized by choosing m(j) to be that member of 


ision rule, a 


symbol r,-, determines the optimum rec eiver. If several m t have the same 
(maximum) a posteriori probabi^ty 7 ^"can~be arbitrarily assigned to any 
one without loss of optimality. 

That this decision rule is optimum becomes clear when we use Eq. 2.29 
to compute the unconditioned probability of a correct decision, P[CJ. 

P[e] = I 1 pte| r,]PM. (2.30) 

The positive quantities P[rJ are independent of the decision rule, and 
therefore the sum on j is maximized if and only if each of the terms 
P[C | r f \ is maximum. 

It is not necessary to compute the probability P[r y ] in order to determine 
the optimum mapping and the resulting probability of error. From 
Eq. 2.28, m k has maximum a posteriori probability, 

PM | rd > P[«*< | r t ] for all i ^ k, (2.31a) 

hence m(j) — m k , if and only if 

P[« fc ] | m k ] > P[wJ PH | m il for all i 5 * k. (2.31b) 
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Once the set {m(j)}, j = 0, 1, 1, is determined from Eq. 2.31b, 
the probability of correct decision, P[C], can be calculated from the 
equation 

P[e] = 2 p l™(/)> (2.32a) 

)=o 

where P[m(y), r i\ denotes the joint probability that m(j) is transmitted 
and r s received. 

Finally, the probability of error, P[€j is given by 

P[8] = 1 - P[e]. (2.32b) 

An Example. In Fig. 2.12a we show a binary channel with two input 



(«> 


Figure 2.12 A binary communication channel. 

symbols {a, b] and two output symbols (0, I}. The input probabilities 
are 

P[aj = 0.6 P[6] = 0.4. 

The channel. transition probabilities are 

P[0 | a] = 0.2 P[0 | 6] = 0.7 

P[I | a] = 0.8 P[1 | b ] = 0.3. 

Thus the probabilities of the four possible input-output pairs, as shown in 
Fig. 2.126, are 

P[a, 0] = P [a] P[0 | a] = 0.6 x 0.2 = 0.12 

P[a, 1] = PM P[1 | a] = 0.6 x 0.8 = 0.48 

P[6, 0] = P[6] P[0 | b] = 0.4 x 0.7 = 0.28 

P [b, 1] = PM P[1 | b ] = 0.4 x 0.3 = 0.12. 
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Since 

• P [b, 0] > P [a, 0] 

P[a, 1] > P [b, lj, 

the optimum receiver is specified by the mapping 

m( 0) = b 
m( 1) = a. 

From Eq. 2.32a, 

P[C] = PM 0] + P[a, 1] = 0.76 

and 

P[8] = l _ P[C] = 0.24. 

The sample points corresponding to error are shaded in Fig. 2.126. 

2.3 RANDOM VARIABLES 

In many of the applications of probability theory— one is tempted to 
say most— real numbers are associated with the points {to} in a sample 
space. For example, in discussing the mathematical model for a sequence 
of M independent trials of an experiment, it was natural to assign to each 
point to a number m(A), chosen to equal the fractional occurrence of A 
in the event sequence associated with to. Another natural example, when 
Q is the real line, is to associate with each point to the distance from to to 
the origin. Equally well, of course, we could associate with to the square 
of this distance. 

The real number associated with a sample point to is denoted x(oS). 
In the general case, in which Q is an abstract collection of points, z( ) 
may be viewed as a function that maps D. into the real line: given any 
point to, the function .< ) specifies a finite real number x(w). A simple 
example of such mapping is illustrated in Fig. 2.13. When £2 itself is the 



x((l)2> x((j}q) x(m) x(oil) x(cos) x(w$) 
Figure 2.13 A mapping k( ) from Q to the real line. 
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real line, examples might be x(co) = w, x(w) = co 2 , or x(oi) = sin co. 
Hereafter, in referring to functions we often delete empty parentheses 
and simply write x to denote the function x( ). 

Distribution Functions 

Once x has been specified, we may inquire into the probability of 
events such as 

A = {co: a < z(eo) < b}, 

B — {co: x(a>) = c}, 

C — (or. > d], 

and so on. The answer to any such question is readily obtained from 



-1 0 1 2 3 4 5 6 

Figure 2.14 An example of a probability distribution function. 

knowledge of the probability distribution function, F x , , defined as 

FJa) = P[{co : *(o>) < «}]. (2.33) 

Clearly, F x is a function from the real line into the interval [0, 1]. For 
example, if 

£2 = {«!, o> 2 , o> 3 } 


?<jL<Uofc* 


p PK] = i p [a>£ = i P[w t ] = 2 

x(co 1 ) = 0 x(o>j) = 4 = 1 


F x (*) = 


0 for a < 0, 

1 for 0 < a < 1, 
| for 1 < a < 4, 
1 for 4 < a, 


as shown in Fig. 2.14. 
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Functions x for which probability distribution functions can be specified 
are called random variables .f In this text we do not consider functions x 
for which F x does not exist. In an axiomatic treatment of probability 
theory care must be taken to avoid the choice of a function x for which 
some event {co: x(co) < a} has not been assigned a probability. 

The properties of distribution functions listed below follow directly 
from the definition of Eq. 2.33. 

I. FJa.) >0; for —oo < a < co. 

II. F x (— co) = 0. 

III. FJ+ co) - 1. 

IV. If a >b, F x (a) - F x (b ) = P[{<o: b < x(co) < a}]. 

V. If a>b, FJa) > F x (b). 

The first three properties follow from the facts that F x ( a) is a probability 
and P[Q] = I . Properties IV and V follow from the fact that 

{co : x(co) <6} U {co : b < x(co) < a) — {<w : x(co) < a}. 

Another property of distribution functions concerns the nature and 
significance of discontinuities, such as those illustrated in Fig. 2.14. 
Consider any positive number «. Since F x is defined in such a way that 
F x (a — e) does not include the probability, say P a , of the event (to: 
x(co) = a], whereas FJa) does include this probability, F x (ct) has a 
discontinuity of magnitude P a at the point a = a whenever P a > 0. 
Furthermore, FJa) is the value of F x at the top of this discontinuity. If 
P a = 0, the height of the discontinuity is zero; that is, there is no dis- 
continuity. The properties of distribution functions are summarized by 
remarking that F x increases monotonically from 0 to 1, is continuous on 
the right, and has a step of size P a at point a if and only if 

P[{o): x(co) = a}] = P a . 

We are not restricted to assigning only one real number to each point 
co in a sample space. In general, we define many different mappings 
(functions) from a single sample space H into the real line. Then we 
have a set of coexisting random variables, say {x 2 }, i — 1,2 k. 

t The nomenclature is somewhat misleading. Actually, a random variable is a well- 
defined function on the points of a sample space. The terminology comes from the use 
of random variables as mathematical models for quantities in the real world such as a 
noise voltage measured at some time t x . 
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First consider the case for which k is 2. Once the functions x 1 and x 2 
are specified, we may inquire into the probability of joint events such as 

{co: xjm) < a lt x 2 (co) < « 2 } = {co: xja)) < aj C\ {or. a? 2 (eo) < a 2 }- 

In order to answer such questions, it is not sufficient to know only the 
one-dimensional distributions, F Xi and F^. All such questions can be 


*2 



Figure 2.15a F^ja-y, oc 8 ) is the probability of the set of all co for which the point 
(^(co), a; 2 (co)) falls into the shaded region. 

answered, however, through knowledge of the joint probability distribution 
function, F Xi ^, defined as 

F xvX 2 (<x i, <x 2 ) = P[{co: Xj((o) < a„ x 2 (w) < oc 2 }]; -co < a ls a 2 < co. 

(2.34) 

Thus -F’* 1 ,* 2 ( a 1 > a 2 ) is the probability assigned to the set of all points co in 
Q that are associated with the region of the two-dimensional Euclidean 
space which is shaded in Fig. 2.15a. 

The properties of joint distribution functions listed below follow 
directly from the definition of Eq. 2.34. 

I. i, a a ) > 0; for — oo < cl x < oo, — oo < a 2 < oo. 

II. F XitX j- CO, a) = F XuX2 ( a, - co) = 0; for - co .< a < co. 

III. F Xu 4 co, co) = 1. 

IV. F^co, a) = FJa). 

V- F Xi ^{x, co) = F Xi { a). 

VI. If a x > by and a 2 > b 2 , 

b'x l> x^F\‘> ^ bj)^ F Xi ' X jbi, bj). 
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Properties I, II, III, and VI are self-evident. Properties IV and V are 
consequences of the facts that (co : x(co) < co} = D for any random 
variable and the intersection of anyevent^ 

“ 2 ) is a monotonically increasing function of both arguments. 



(b) 

Figure 2.156 Example of a two-dimensional distribution function, F XyXi (« lt a 2 ). 

and 0 < F. Xi<x (y. u a 2 ) < 1. An example of a possible distribution function 
F Xi ^ is shown in Fig. 2.156. 

When k random variables, x ly %,..., x k , are defined on Q., it is conven- 
ient to adopt a concise notation. Let x denote the k- tuple' (a^, x 2 , , xj). 
We then define the ^-dimensional joint probability distribution function 
F x (ct) as 

^x(a) = P[{": *iO) < «i, x 2 (o ) ) < « 2 , . . . , x k (co) < a*}], (2.35) 

where a = (a x , a 2 , . . . , %). We refer to x as a ^-dimensional vector of 
random variables or, more simply, as a random vector. 
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Two fe-dimensional vectors, say 

a = («!, a k ) (2.36a) 

and 

(2.36b) 

are said to satisfy the relationship 

a < b (2.36c) 

if and only if the inequality holds for each pair of corresponding 
components; that is, if and only if 

di < bf, for i — 1,2, ... ,k. (2.36d) 

With this notation, we can rewrite Eq. 2.35 more concisely as 

F x ( a) - P[{to : x(co) < a}]. (2-37) 

A vector such as a in Eq. 2.37 may be thought of as designating a 
point in a ^-dimensional Euclidean geometry, the coordinates of the point 
being the components of the vector. Similarly, a random vector x 
designates a mapping from the sample space ft into Euclidean k- space, 
that is, x assigns a particular point x(a>) in Euclidean fc-space to each 
sample point cu in ft. The inequality x < a defines a region in Euclidean 
k- space. The number F x ( a) is the probability of the set of sample points 
a > mapped into the region x < a by x(co). Distribution functions in 
k dimensions evidence properties that are straightforward generalizations 
of those already discussed for one or two dimensions. 

As an example of the calculation of joint probabilities, consider the 
three-dimensional vector 

x = (*!, * 2 , * 3 )> 

the three functions 

x x (co) = co, 

, N. A 2 
X 2 (CO) = O) , 

cc 3 (co) = 1 — co, 

and the probability assignment encountered in Eq. 2.12; that is, consider 
ft = {co: 0 < co < 1), and the probability assignment given by 


P{4] = /(co) d<x> = dco, 


where the integration is over the set of intervals constituting the event A. 
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Figure 2.16 Examples of the evaluation of a joint distribution function. 

From Fig. 2.16 we see that 

F x (0.5, 0.25, 1) = 0.5, 

F x (0.5, 0.25, 0.8) = 0.3, 

F x (0.8, 0.16, 0.8) = 0.2. 

It is evident that the ^-dimensional distribution function always 
provides all information necessary for determining the probability of the 
set (co) such that x lies in any specified region of ^-dimensional Euclidean 
space. But the direct use of F x is usually inconvenient in computations. 
For example, from Fig. 2.17 it is clear that for the probability of a 
rectangular region we have 

P[{co: a < aq(co) < b, c < ® 2 (co) < d}] 

= F x {b, d) - F x (b, c) - F x (a, d) + F x (a, c). 
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Figure 2.17 Region of plane for which a < x 1 < b, c < < d. 

The probability of even this simple event entails an expression with four 
terms. In three dimensions the probability of a cubic region entails an 
expression with eight terms, and so forth. 

Density Functions 

The notational inconvenience of the distribution function can be 
avoided by introducing a function called the probability density function 
which permits probabilities to be written in the familiar form of integrals. 
For the single random variable, x, the probability that x lies in a small 
interval [a, a + A] is 

P[{a>: a < x < a, + A}] = F x (a + A) — Ffa) 

_ A | -F fl! (a+A)-F a! (a) ^ 

L A 

If A is very small and F x ( a) is differentiable at a = a, the term in brackets 
is approximately 


F x (a + A) - F x (a) dF x { a) 

py 

A da. 


; A > 0 


P[{a>: a < x < a + A}] A F x (a) 


(2.38a) 

(2.38b) 


in which the prime denotes the derivative of F x . 

Now consider a region, say /, of the real line: any such region may be 
thought of as the union of a large number of disjoint intervals, each of 
which has length A, as shown in Fig. 2.18. Furthermore, the probability 
of a union of disjoint events is the sum of their probabilities. If Ffa) is 


1 h'h'h' hh 

/ = / 1 UJ 2 Ul3U/4Ul5 

Figure 2.18 Decomposition of a region I into a union of small disjoint intervals. 
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everywhere differentiable, it follows by taking the limit A — *■ 0 that the 
probability that x lies in / is 


P[{o>: x(pd) in /}] = JV^a) da. 


Whenever it exists, the derivative of F x is called the probability density 
function of x and given the symbol p x . Thus 


P[{co : x(co) in /}] = J pja) da , 


, \ a dF x (a) 

PxW = — 

da 


(2.40a) 


In order to calculate the probability of an event, we integrate the 
probability density function over the region defining the event. In 
particular, 

^(a) = r pM d($. (2.40b) 

J—00 


The class of distribution functions with which we are concerned may 
fail to have a continuous derivative at a point, say a = a , for one of two 
reasons: 

1. The slope of Ffa) is discontinuous at a — a. 

2. F x { a) has a step discontinuity at a — a , that is, 

P[{a>: x{(d) = a}] = P a 0. 

In the first case the problem is one of ambiguity, which is easily resolved 
by always taking p x to be the derivative on the right of F x , as implied by 
Eq. 2.38a. 

In the second case the problem is more fundamental but may be 
resolved by extending our definition of the probability density function. 
Consider a distribution function F x which has a single discontinuity of 
magnitude P a at the point a = a, as shown in Fig. 2.19. For a region 
I = [b , c] which includes this point, the contribution to P[{o>: x(a>) in /}] 
from all small subintervalsf of [b, c ] except the subinterval [a - e, d\ is 


p x (a) da + 


f PM 

Ja 


f In this section, subintervals such as [/S, y] include the right end point (y) but not the 
left end point 0). 
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F x (a) 



Figure 2.19 A discontinuous probability distribution function. 


When e is sufficiently small, the contribution to P[/] of the interval 
[a — e, a ] is P a , the magnitude of the discontinuity, and 


p [I] = P[{o>: b < x(a>) < c}] = J p x ( <x) da + P a + J p x ( a) da. (2.41) 

To reduce Eq. 2.41 to the simple form of Eq. 2.39, we introduce the 
Dirac impulse notation.! A unit impulse may be visualized as the limiting 

rn 

2A 

i 

1 — FT 

(a- A)* '(a + A) 

Figure 2.20 The square pulse approaches the unit impulse <5(a — a) when A approaches 
zero. 

form of a positive pulse of unit area as the pulse duration is reduced to 
zero, as shown in Fig. 2.20. Operationally, a unit impulse at a = a, 
denoted <5(a — a ), is defined by the equation 



if (a); if I includes point a 
\0; otherwise 


(2.42a) 


for any function /continuous at a. If a is an end point of the interval I 
and ambiguity is possible, it is desirable to indicate explicitly whether a 
is in /. This may be accomplished by using an asterisk if a is not in I. 

Thus. ra pco 

/(«) <5(a - a ) da = /(a) 5(a - a) da =f(a), (2.42b) 

J— go Ja 

f /(a) <5(a — a) da = f /(a) <5(a — a) da = 0. (2.42c) 

J — <x> Ja* 

f The impulse is discussed in more detail in reference 62, Appendix 2A. . 



HI 


Mf 




A trivial implication is that^ 




P a <5(a — a) da. = P a . 


If, therefore, we introduce into p x (a) a term of the form P a <3(a — a) for 
each point at which F x has a discontinuity, we can again write the 
probability that x lies in / in the simple form 


P[{&) : x(co) in /}] = j p x ( a) da. (2.43) 

i 

For example, the density function for the distribution considered in Fig. 
2.14 is ^ 

Px( a ) = i <5(«) + i «5(« — 1) + o <5(a — 4), 

which implies 

P[{(o: x(oj) < 1}] = F x ( 1) = f p x ( a) da = \ + % == §, 

J — CO 

as it should. 

Since any distribution function F x increases monotonically and 
H(+ °°) = 1, any density function must satisfy the properties 


and 


pfa) >0; all a 


(2.44a) 


pfa)da = 1. 


(2.44b) 


Examples. The following continuous probability density functions are 
frequently encountered. In each case, the parameter b is a positive 
constant. The density functions are illustrated in Fig. 2.21. 


1. EXPONENTIAL 


2. RAYLEIGH 


Pxi«) = 

b ’ 

a > 0, 

(2.45a) 

* 

,0; 

a < 0, 


Fxip) = 

f 0; 

d - e - alb ; 

a < 0, 

a ^ 0. 

(2.45b) 

Px(«) = 

- e”“ 2/26 ; 
b 

a > 0, 

(2.46a) 

1 

S); 

a < 0, 


FM = 

f 0; 

d - e-^ 26 ; 

a < 0, 

a > 0. 

(2.46b) 


Q-|i— 





Figure 2.21 Examples of probability 
density functions: (a) the exponential 
density function; ( b ) the Rayleigh den- 
sity function; (c) the uniform density 
function ; (d) the Cauchy density func- 
tion; (e)the Gaussian density function. 


3. UNIFORM 

P*( a ) - 



— b < a < b, 
elsewhere, 


(2.47a) 



a < — b, 

— b < a <1 b, 
a > b. 

— oo < a < co. 


(2.47 b) 


(2.48a) 


F _(a) = ~ + - tan -1 - ; — co < cc < co. (2.48b) 

2 tt b 
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5. GAUSSIAN 


pM = ~ 


(2.49a) 




(2.49b) 


in which we have defined 


G(«) = £ e y * /2 dy. (2.50) 

The function 0( a) is not an elementary integral, but its complement, 
1 - 0(a), is well-tabulated. 59 It is related to the more familiar error 
function 

erf (°0 = £ e~ y2 dy (2.51a) 

by the equation 

2(°0 = \ 1 - erf(J=) . (2.51b) 

As an illustration of the calculation of probabilities by use of density 
functions consider the interval /= [1, 2], If the random variable a; is 
exponentially distributed, 

P[(o>: ,t(co) in /}] = j p fa) da. =£* j- da 
= (e~ 1/b - e - 2lb ). 

T wo-dimensional density functions. The notational convenience of 
writing probabilities as integrals is extended to two random variables,' 
say x x and x 2 , by defining a Joint density function , p x >aj , in such a way 
that for any two-dimensional region I we have 


C^i(co), s 2 (o>)) in /}] = JJ p Xva fa x , a 2 ) da x da 2 . (2.52) 

i 

(The arguments a x and a 2 are associated with x x and x 2 , respectively.) 
To see how/^ must be defined in order that Eq. 2.52 may be valid, let 
us first consider the small rectangular region shown in Fig. 2.22; 

[a x - Ai < a x < a x , a 2 - A 2 < cc 2 < 

From Eq. 2.52 we have then 

P[{co: a x - A x < xfia) < a x , a 2 - A 2 < xfm) < a 2 }] 

~ ~ F \.x 2 ( a i> #2 — A 2 ) ■“ — A x , a 2 ) 

+ “ A x , a 2 - A 2 ). (2.53a) 


f i*Ar 
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Figure 2.22 A small rectangular region. 

The right-hand side of Eq. 2.53a may be rearranged in the form 
a a 2) F xnxS.Q'L Ai» 

‘L A 

F xi a 2 A 2 ) ~ ^'a:i.xa( a l ~ An ^2 ~- A 2 ) 

A, 

If the partial derivative of F x ^ with respect to % exists, for small A x 
this approaches 

^ r ^ict.aaC 01 !’ a s) _ dF Xl , Xi (ftp q a A 2 )1 


__ ^ ^ ^ ^ai.a:a( K l> ^2 A 2 ) 


Finally, if (a*/da x da^/^K, cc 2 ) exists, for small A 2 this in turn 
approaches 

d 2 

AjA 2 - T [-f «i,xa( a l> a 2)3ai=ai • 

oaj oa 2 as =(»2 

Thus, when both A x and A 2 are small, we have 

P[{o>: a, — Aj < ^(cu) < a lt a 2 — A 2 < x 2 (o>) < a 2 }] 

~ AA 2 [F X1 , W ( a,, (2.53b) 

<70C 1? C/OCg a2=02 

whenever the derivative exists. 

Now consider an arbitrary two-dimensional region, say /, in the (a 1? a 2 ) 
plane, as shown in Fig. 2.23. The region / may be built up of small dis- 
joint rectangular regions, each having area A 1 A 2 . From probability 
property III, the probability that x = (x 1} x 2 ) lies in / is the sum of the 
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CC2 



Figure 2.23 Building up the region / = /, u / 2 out of small disjoint rectangular 
regions. 

probabilities that x lies in these rectangles. Whenever F Xi Xt is differen- 
tiable, in the limit as both A x and A 2 go to zero we have 

P[{co : x(o>) in /}] = JJ F*!, *,(«!» «») dcf -i d *z- (2-54) 
1 1 2 

Defining 

9 a 

Pxi,xSs^t> ^ 2 ) ■ • ~z ^ F <x 2 ), (2.55) 

(/(Xj <7<X 2 

we can write Eq. 2.54 concisely in vector notation as 

P[{o>: x(w) in /}] = J pja) da, (2.56a) 

1 

where 

/>*(«) = *>■**,( «i» « 2 ), (2.56b) 

da £ da x dv. 2 , (2.56c) 

and the (multiple) integration is over all points a in the two-dimensional 
region /. 

Just as in the one-dimensional case, Eq. 2.55 is inadequate to define 
the joint probability density function p x at points where F x is discontinuous. 
The difficulty is again resolved by using impulses to account for dis- 
continuities, as illustrated in the examples that follow. 

Since any joint distribution function F x iX (a l9 a 2 ) is a monotonically 
increasing function of both a x and cc 2 , it is clear that any joint density 
function p XvXi must be non-negative at every point (k 1} a 2 ): 

Px l .xS cf -i’ « 2 ) sS* 0; — co < a x < co, — 00 < a 2 < co. (2.57a) 




(c) p — 0.9 

Figure 2.24 Examples of the two-dimensional Gaussian density function. 
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(a) p = 0 


(b) pa - 0.5 
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Also, since F ai , x Ji co, co) equals unity, p Xv x 2 must satisfy the equation 


i» <* 2 ) = 1. 


(2.57b) 


Examples. An example of a valid joint density function that is every- 
where continuous is the two-dimensional Gaussian density function, 

n (r, a 'I - 1 exn f- ~ 2 P a i a 2 + . 

2ttVi - p 2 L 2(1 - p 2 ) J 

-1 < p < 1, (2.58) 

which is illustrated for several values of the parameter p in Fig. 2.24. 


«2 



Figure 2.25 An impulsive two-dimensional density function. The integral through an 
impulse — that is, the number by which a unit impulse is multiplied— is called the im- 
pulse value. An impulse A <5(« — a), or a two-dimensional product of impulses such 
as A (5 (a t — a!)<5(a a — a 3 ), is plotted as a vertical line whose height is equal to the 
value A. 


A purely impulsive example is the joint density function 

a 2) = IIl7 <X<*i “ 0 <*(«2 ~ J)> (2-59) 

2=13=1 JQ 

illustrated in Fig. 2.25. The probability is concentrated at the 36 points 
(/,/); 1 < i < 6, 1 </' < 6. This density function would be appropriate 
as a mathematical model for a dice game. 

We may also encounter joint density functions that are impulsive in one 
dimension and continuous in the other. An example is 

jP*i,*a( a l, «2 ) = <K a l - 0 - 4 = ex P [- (ag ~ ^ 1 • (2.60) 

t=i 2 ^jzrt L 2 J 
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As shown in Fig. 2.26, this density function may be visualized as two 
“fences” of impulses at a x = 1 and a 2 = 2. 

For a simple example of the use of a joint density function to calculate 
a probability, consider the event A defined by 

A = {to : xf(oS) + # 2 2 (ft>) < c 2 } 

and the two-dimensional Gaussian density function of Eq. 2.58, with the 


«2 



Figure 2.26 Two “fences” of impulses. The value of the one-dimensional impulse at 
a.! = 1 (or ce l = 2) depends on a 4 . 


parameter p specialized to zero : 

a 2> = 


2tt 


From Eq. 2.56a we have 


?[A] = p x { a) da 


i 



Pxi,x 2 ( v -i> a s) da 2 
— e -< a i 2 -i- a 2 ‘ )/2 da x da 2 . 


where the region / is the interior of a circle of radius c, centered on the 
origin of the (a 1( a 2 ) plane. The integration is easily carried out by making 
the change of variables 


A / 2 , 2 

r = \'a, -F a 2 , 




V 

I 
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Since the differential area in polar coordinates is r dr dQ, we have 

P[4] = f ° f 2 ' T J- e - ri ' 2 r dr dQ = Pr^ 2 dr 
Jo Jo 27 t Jo 

f c2/2 2 

= J o e~ p dp = 1 - <r e /2 . (2.61) 

Elimination of a random variable. It often happens in applications that 
we know a joint density function, say p Xl , Xz , but are interested in the one- 
dimensional density p Xj , which is easily obtained as follows. From 
Equation 2.56, 


F X,,xS a l> a 2> = 


Px lt x 2 (P i, PS) d /?2 dp^. 


But it has already been established (property V of joint distribution 
functions) that 

F mi (*d F xi’X>( a li ° o)j 


FM = 


Pxi.xSPn PS) dp s dpi- 


As usual, we obtain p Xi by differentiating F Xx . Since the derivative of a 
definite integral with respect to the upper limit is given by 

d r. fp h(P)dp~r A h(p)dp 

- h(P) dp = lim ±2. 

d a J- oo a-vo L A _ 

= lim = h(p) (2.62) 

a-o L A _ 

whenever h(P) is continuous at p = a, we have 


P*,( a i) = ~ - — 


d f ai rr 00 


c/sq J-co LJ-co 


Pxi.xS.Pi'> PS) dp2 dP L 


Px>,xS a i> PS) dP 2 - 


Equation 2.63 is a generalization of the theorem on total probability of 
Eq. 2.23b. 

It may be helpful to think of a two-dimensional joint probability density 
function p x (a ) as analogous to a mass density distributed over a plane, 
where the total mass is unity. The situation can be visualized with the 
help of Fig. 2.27. The probability that x lies in a region / is identified 
with the total mass located over I, since total mass is also obtained by 
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Pxi, * 2 («l» «2> 



Figure 2.27 A two-dimensional probability density function. 

integrating (mass) density. We extend the analogy by noting that inte- 
gration over one axis of the plane, as in Eq. 2.63, determines the (one- 
dimensional or “marginal”) mass density along the remaining axis. Thus 

an integral such as J^fa) rfotj corresponds in a two-dimensional 

problem to determining the total mass over an infinite strip, parallel to 
the a 2 -axis and extending from a < oq < b, as shown in Fig. 2.27. 

As an example of the calculation of p Xi from p^, consider the two- 
dimensional Gaussian density function of Eq. 2.58. Then, for a given 
value of p, | />| < 1, 


Pxi* i) = 


:exp — 


2 pq 1 « 2 + «a 2 ) ~l , 

-(1 — p 2 ) J 


P L 2(i — p 2 ) J 

The integral is readily evaluated by completing the square in the exponent 
and letting y = (a 2 — pa x )/(l — p 2 ) 14 : 

(oq 2 - 2poqa 2 + a 2 2 ) = (a 2 - poq) 2 + a x 2 (l - p 2 ), 


„-«l 2 /2 Cco 


Pxi*l) = ~ 7= 


\j 2 tt \J 2-^(1 — p 2 ) 


exp — 


(q 2 - p« x ) ! 
2(1 - p 2 ) 


„-ai 2 /2 /*co 


• V 2 ^ e 


(2.64) 
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Thus x 1 (and also x 2 ) individually are Gaussianly distributed. For this 
joint Gaussian case the total probability that x = (aq, aq) lies in the infinite 
strip in Fig. 2.27 is therefore 


Pxvxi* i> « 2 > da 2 dcq = —= e r ' dcq 

JaJ— co Ja yj 2/TT 

= e(A) - q(&). 

The function £>( ) has been defined in Eq. 2.50. 

Multidimensional Density Functions 

By analogy with the two-dimensional case, the probability density 
function of a ^-component random vector x — (aq, x 2 , . . . , aq) is defined 
in such a way that 


P [{co: x(a>) in /}] = Jp x (P) d$ 


for any ^-dimensional region /. In particular, by letting I denote the 
region x < a, where 

a = (oq, a 2 , . . . , oq), (2.66a) 

we have 

^x(a) = P[{ft>: x(o>) < a}] 

fat /*«i 

= ••• Pji&dhdh--- dfi k . (2.66b) 

J— oo J — co J— CO 

Differentiating with respect to the limits, we identify 

Pxi*) = r - a 9 ,, P x («) (2.66c) 

C'OC^ (7CC 2 " (/GC^ 

whenever F x (a) is continuous at the point a. At points of discontinuity 
impulses are introduced into p x (as in the one- and two-dimensional 
cases) in such a way that Eq. 2.65 is valid. 

As an aid to visualizing the meaning of joint probability density, it is 
convenient to interpret Eq. 2.65 as stating that the. probability that x lies 
in a small k-dimensional region of volume AV containing a point a = a is 
approximately p x ( a) AV whenever pfa.) is nearly constant over the region. 

In the general ^-dimensional case, just as for k = 2, unwanted random 
variables are eliminated by integration : If 


x = (a.-!, x 2 , , xj 

x' = (x 1} * 2 , . . . , av_!, x i+1 , . . . , x k X 


(2.67a) 

(2.67b) 
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then 

iv(ct') = P[{o>: x' < a'}] 

= ?[{(o : x' < oc', Xi < co}] = F x (a l5 a 2 , . . . , a^, oo, a t+1 , . . . , a fc ), 

where 

a' = (a ls <x 2 , ... , a f _ 1} a f+1 ,_. . . , a fc ). (2.67c) 

Differentiating with respect to all oq, y 5*= i, yields 

Px'(a') = f Px(«) d«i> (2.67d) 

J — 00 

which generalizes Eq. 2.63. 

Equality of Random Variables 

Let aq and x 2 denote two random variables defined on a sample space 
Q. Two random variables are said to be equal if and only if the probability 
of the set of points co on which they differ has zero probability; that is, 
we write 

x x = x 2 (2.68a) 

if and only if 

P[{co: Xi(a)) 7* a 2 (ft>)}] = 0. (2.68b) 

In particular, % = x 2 if xfyo) = for all w in O. Since we do not 
expect to observe an event of zero probability, we do not make a distinction 
between {(»: xfco) ^ .r 2 (a>)} = 0 and the more general Eq. 2.68b. 

Transformation of Variables 

Electrical communication involves the generation and processing of 
random signals: waveforms are transformed by modulation, detection, 
filtering, and so forth. As a consequence, many of the communication 
applications of probability theory involve the generation of new random 
variables by means of transformations applied to given ones. We now 
consider the calculation of the probability density function for new random 
variables obtained by certain simple (but important) transformations. We 
begin by assuming that the density functions of the original random 
variables do not contain impulses. Impulses are considered separately 
at the end of this section. 

Assume that we are given a random variable x with density p x . Let y 
be a new random variable, obtained from x by a real-valued piecewise- 
differentiable transformation 


V = f( x ). 


(2.69) 
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By Eq. 2.69 we mean that the number y{ai) associated with each sample 
point a> is 

y(co) = f(x(a>)). 

One method of obtaining the density function p y from p x and Eq. 2.69 is 
first to express in terms of p x the probability of the set of sample points 
(co: y(aj) < a}. This gives the probability distribution function of y, F y , 
from which p v can be obtained by differentiation. 



Figure 2.28 The effect of the transformation y = x -f- a. 


Transformation by the addition of a constant. Consider as an example 
the transformation 

y ~ x 4- a 

in which a is a constant. The set of sample points {w:y(o)) < a} is 
identically the set {co : x(oj) < a — «}. Thus 

P[{co: y(co) < a}] = P[{co : x((d) < a — a}] 

or 

Ffv) = F x (a - a). 

Differentiating with respect to a, we have 

= PJ* ~a)\ y = x + a. (2.70) 

The density function p y is the density p x translated a units to the right, as 
shown in Fig. 2.28. For example, if x is Gaussian with density function 

/>»(«) = - 4 = e -° 2/2 

fllT 

and y = x + a, then 

/>*(«) = PxiP- ~ a) = 

fl-rr 


60 PROBABILITY THEORY 


More generally, if x is a random vector and y = x -f a, where a 4 
{a-y, a 2 , . . . , is a constant vector, then 

F y (a) = P[{co: x + a < a}] 

= P[{co : x < a — a}] = F x (ct — a) 
and 

p y (a) - - ~ 8 " — F*(« - a) = p x (a - a); y = x + a. 

uCf.% * OCf.j. 

(2.71) 

Transformation by multiplication by a constant. A slightly more in- 
volved transformation is 

y = bx. 

When b is a positive constant, we have 


P[{o>: y(a>) < a}] = P <o: x(co) < ^ 


and thus 


A( a ) = 


p v («) = + 7 ^( 7 ) ; 2 / = b > o. 

0 W 


On the other hand, when b is negative, we have 


P[{co: y{oj) < a}] = P Jcu: x(oS) > “ 


(2.72a) 


and thus 


W = 1 - F a \~ b j , 


p£<x) = - “ P«(^) ; y = bx,b< 0. 


(2.72b) 


Equations 2.72a and b can be combined into the single expression 

■ aW= ^ p # v = bx - 
For example, if a; is a Gaussian random variable with density 
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and y = bx, b 9^ 0, then 


Pv( a ) = — Pal- = 


a /26 " 


■Jli rb 2 


More general transformations. Identical reasoning is also applicable to 
transformations that are not one-to-one. Consider, first, the half-wave 
linear rectifier transformation 

f a;; x > 0 

(2.74a) 

0; x < 0 


y 



Figure 2.29 The half-wave rectifier transformation. 


illustrated in Fig. 2.29. In terms of the input density function p x , we have 
P[{<o: y(pS) < 0}] = 0, 

P[{co:y(co) = 0}] = (° pMdfiAp* 

J — OO 

P[{co: 0 < yipS) < a}] = 

For the half-wave linear rectifier it follows that 

A(«) - A m + «»#), (2.74b) 

where w„ x (a) is the unit step function 



fl; a>0 

*U«) = (2.74c) 

(0; a < 0. 

A second example is the full-wave quadratic rectifier transformation 
y = x 2 . Clearly, 

. ( f + *pM d P = A(V“) " A(-V a ); 

A( a ) = p [( w: y(F>) < a}] = \ J - Va 


r, n 
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It follows that 


— V [p*(V a ) + Pa(~Va)3; a > o 
p,w= 2 V« 

0 ; «< 0 . 

For instance, if a; is a Rayleigh random variable with density 


a >0 

(where 6 is a positive constant) and y — x~, then 


p„(a) = — ~j= [p x (\/ a ) + P*(""\/ a )3» a>0 

ZyJ a 


1 l \J ^—a/2b 


= _L= e -“ /26 + 0 ; a >0 

2Va\ b > 

a>0 

lo; a < 0. 

We observe that y is an exponentially distributed random variable. 

Iterated transformations. It is sometimes convenient with complicated 
transformations to apply the above-mentioned techniques in sequence. 
We illustrate this by the simple example 

y = bx + a. (2.76a) 

Define the new random variable z = bx. Then y — z + a and 

pM= wA~b) 


, 1 (cf. — a\ 

p , w ) = p.(« - «) = jp| j- 


(2.76b) 


For instance, if a; is Gaussian with density function 


we have 


P !C (.*) = -J=e-° ,Z ’ 

\jZTT 

, n 1 (a— a) S /26 2 


(2.77) 
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The resulting random variable y is Rayleigh. This method is further 
elaborated in Appendix 2A. 

Impulsive densities. When y =f(x) and p x contains impulses, we 
determine p y in two parts. The first, resulting from the nonimpulsive com- . 
ponent of p x , is obtained as before; the second, resulting from impulses 
in p x , is obtained by the following means. If p x contains an impulse 


Pi(a) 



Py(«) 



Figure 2.30 A transformation with impulsive densities. 


P a 5(a — a), then an impulse of value P a is added to p y at the point 
a =f(a). 

As an example, consider the half-wave rectifier transformation of 
Fig. 2.29 and the density function of Fig. 2.30a, 

f>,(° 0 = i[j(« - 2) + 5(oc + 2)] + 

From Eq. 2.74b the continuous part of p x contributes to p v the terms 

• |5(a)+f 


CONDITIONAL PROBABILITY DENSITY 65 

The impulse — 2) in p x contributes to p y the impulse £ 5(a — 2). The 
impulse $ 5(a + 2) in p x contributes to p y the impulse | 5(a). Thus, as 
shown in Fig. 2.305, 

p v ( a) = \ 5(a) + 5 (a - 2) + f u_ x { a)<T 2 “. 


Conditional Probability Density 


Given an event B of nonzero probability, the conditional probability of 
an event A has been defined as 


P [A | B] = 


P [A, B 1 
P[P] 


Often events A and B are defined in terms of random variables. For 
example, let x and y be two random variables defined on a sample space 
D. and define the events 


A = {co: aj < x^a)) < (2.79a) 

B = {«: a 2 < < b 2 }- (2.79b) 

Then, whenever the denominator is nonzero, 

pa rt > i 

P,„x 2 («. i5) d a dp 

P [A | B) = (2.79c) 

Jaz 

If b 2 = a 2 , however, and p Xi (P) is not impulsive at (S = a 2 , the denomi- 
nator in Eq. 2.79c is zero and the meaning of P [A | B] is not immediately 
clear. 

Before proceeding with the mathematical treatment of this issue, let us 
consider in more detail the role played by random variables in modeling 
the real world. A random variable with a continuous density function is 
an appropriate model for a real-world experiment whenever the outcome 
may be any real number. The measurement of a noise voltage at some 
time /j furnishes an example. In such a physical experiment there is a 
fundamental limitation to the accuracy of measurement; we cannot read 
a voltmeter with infinite precision. Thus “a measured voltage x equals v 
actually means that the result of the experiment is a voltage lying in some 
interval v - A < * < o + A, where A is a small positive number reflecting 
the precision of the voltmeter. 

This distinction becomes important when we wish to use the result of 
such a measurement as a conditioning statement. In order to retain 
physical verisimilitude, we should introduce into our mathematical 
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formulations a quantity such as A. Thus event B in Eq. 2.79 might be 
{« : v — A < x 2 (oj) < v 4- A}, 

which in general is an event of nonzero probability. Equation 2.79c 
then becomes „„ . 


PM I B] = 


f'b i /’w+A 

Pxj.xfa P) dp da 

a i J v—A 


PxXP) dp 


From a mathematical viewpoint it is inconvenient to carry along the 
parameter A. Whenever the ratio on the right-hand side of Eq. 2.80 is 
insensitive to the precise value of the (small) quantity A, it is simpler to 
consider the limit as A — ► 0, even though P[5] may then approach zero. 
Thus we define the conditional probability of A, given x 2 — v, to be this 
limit and writef pi 

| P) dp da 

PM I *2 = v) = lim . (2.81a) 

4 ~° p„m w 

Jv—A 

Interchanging the order in Eq. 2.81a, we have 


PM | x 2 = »] — 


bt P) dP 

da lim 

A-0 f V+A 


rv+A 

PdP)dp 

Jv—A 


(2.81b) 


We note in Eq. 2.81b that the conditional probability that x x will lie in 
the interval [a x , b{\ is obtained by integrating a non-negative quantity over 
the interval. Moreover, byEqs. 2.63 and 2.8Ia,the integral of this quantity 
over the entire real line is unity, so that it meets all the requirements of 
a probability density. Accordingly, we define 


Px i.xjs*-’ P ) dp 

'V+A 

PxXP)dp 


PxS a 1 X 2 = V ) = lim 

A -»0 


and call p Xi ( a \ x 2 — v) the conditional probability density of x l} given 
x 2 = v. Equation 2.81b can then be rewritten 


P[A \x 2 = »] = p Xl (a \x 2 = v) da. (2.83) 

J a i 

f Whenever the meaning is unambiguous, we shall henceforth denote events such as 
{oy. ® a (a>) = v} by the simpler expression :c 2 = v. 
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When the density functions in Eq. 2.82 are continuous at p = v, the 
defining equation for the conditional density function simplifies to 


or 



Px 1,88 C 0 ' - ’ ^ 


PxuxX** V ) = p*i(« l ^2 = »)p X2 (v). 


(2.84a) 

(2.84b) 


Equation 2.83 can still be used when the density functions contain impulses 
at p = v; that is, even when there is a finite probability that x 2 — v. We 
then interpret the right-hand side of Eq. 2.84a to be the ratio of the values 
of corresponding impulses in numerator and denominator. It is evident 
from Eq. 2.83 that conditional probability density is completely analogous 
to ordinary one-dimensional density. 



Figure 2.31 A plot of px, v (a,fi) illustrating the dependence on v of the shape of 
p x { « [ y = v). There is no dependence only in the special case p x , y ( a, fi) = p. x (a) p v {(i ) ; 
see, for example, Fig. 2.27. 


The relationships between two-dimensional density functions on the one 
hand and conditional density functions on the other can be easily visualized 
graphically. Consider the continuous joint density function shown in 
Fig. 2.31. The shape as a function of a of the conditional density function 
p x ( a | x 2 = v ) is given by tracing the intersection of the surface p Xl , Xi ( a, p) 
with a vertical plane erected on the line p = o. In general, the shape is 
different for different values of v. Division by p x (v) normalizes the total 
area under the trace to unity. Given k 2 = v, the conditional distribution 


68 PROBABILITY THEORY 


function of x lf denoted F Xi (a | x 2 = v), is the area under the normalized • 
trace from — co to a. 

As an example of the definitions, consider the random variables x x and 
x z with the Gaussian density function 


, , 1 «i — 2p« 1 a 2 + oc 2 . . 

Px( a l. a 2> = 7 eX P 2 x ’ 

2W1 - p 2 L 2 0 ~ P ) -I 

We have determined in Eq. 2.64 that 

f 00 1 2 

P Xl (<* l) = Px(« If “2) = “7== /a » 

J— to W 277 

hence, by symmetry, that 


!pl < 1 . 


&..(«*) = ~p= e ~ 

V 27r 


j>„(«I ^ = ») = 




1 ( [~ « a — 2 pea; - 1 - o 2 ?; 2 ~ ) 

_ V277(l.-p 2 ) eXP l L 2(1 — p s ) 2 JI 

= 1 exp \- ^ ~ . (2.85) 

V2H1-P 8 ) L 2(1 - p 3 )J 

Given x z = v, the conditional density function of x x has the form of 
Eq. 2.77 with a = pv and b 2 = 1 — p 2 . 

When |p| approaches unity, the conditional density function of x x , 
given x z = v, becomes very large for a pv and very small elsewhere, as 
shown in Fig. 2.32. Since the integral under p x (a | x z = v) is always 
unity, we observe that the conditional density function approaches a unit 
impulse centered on ±v as p — > ±1. 

Applications. The usefulness of the concept of conditional probability 
density can be demonstrated by two examples. For the first example con- 
sider two random variables x and y and the transformation 


z = x + y. 


We desire the probability density function of the random variable 2. 

We have already considered a transformation of the form z = x + ft 
when ft is a constant and found (Eq. 2.70, with a change of notation) 

p z iy)~p«iy-ft)- (2.86a) 
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Figure 2.32 The conditional Gaussian density function, p Xl ( a | = v) as a function 

of p. 


This result can be applied .to the present problem by use of conditional 
probability density. Focus attention on that part of the sample space for 
which y(o>) equals ft. Over this region a; + y is a; + ft, and Eq. 2.86a is 
valid, with the important proviso that we state the condition explicitly. 
We have 

Pz(r \y = ft) = pw(y - ft \y = ft)- (2.86b) 

The joint density of 2 and y is obtained by first multiplying both sides of 
Eq. 2.86b by p y (ft), 

Pz,v(v> ft) = Pziv I y = ft)p v ift) 

= Px(y - ft \y = ft)Pv(ft) 

= Px.viy - ft ', ft), 

and then integrating out the unwanted variable in accord with Eq. 2.63 ; 

p z (y) = f Px.viy — ft, ft) d ft; 2 = x+ y. (2.87) 

J— CO 

As a second example, consider the product transformation 


2 = xy. 
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For z = px, where p is constant, we have found (Eq. 2.73) g 

I 

p (y) = JL p /ZV (2.88a) • | 

\p\ \p) • j 

Restricting attention to the region of £2 for which y(co) = p, we have 

P.(y|y = » = jip^|w = 4 ( 2 - 88b > ' ! 

Again it is important that the condition be stated explicitly. Multiplying | 

both sides of Eq. 2.88b by p y (P) and integrating over 0 yields 

z=xy - (289) 

These results, of course, can also be derived by the method of trans- 
formation of variables. For z — x + y the condition z < y is met by all 

y 



Figure 2.33 The region for which x + y < y. 


points in the x, y plane below the line z + y = y, as shown in Fig. 2.33. 
The probability that the point (re, y) will fall in this region is 
p=o r-t-p 

F z (y) = dpi p XiV (a, p) da, 

J — CO J— CO 

and thus 

Pziv) = f Px.yir - P’P) z — x + y. 

J — co 

Statistical Independence 

In the case of random variables the definition of statistical independence 
is somewhat simpler than in the case of events (see Eqs. 2.26 and 2.27). 
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We call k random variables x l9 x 2 , . . . , x k statistically independent if and 

k 

only if the joint density function p x factors into the product p Xj ; that 
is, if and only if i=1 

pJs*) = p^M Pxf,* 2 ) * ‘ • Px k M for a11 “■ (2-90) 

Let x denote a set of k statistically independent random variables and 
consider the random vector 


X — (# 1 , X 2 , . . . , x l+ 1» • • • » x k ) 


(2.91a) 


obtained by omitting x v The joint density function p x > is giv^n by 


ivm = p*(«) da i = n Px&i)- 

J—oo i = 1 

li-fil) 


(2.91b) 


We conclude that the components of x' are also statistically independent. 
It is readily induced that the statistical independence of a set of random 
variables guarantees the independence of any subset of them. 

If we have a set of k events, say A x , A 2 , . . . , A k , such that each event A t 
is defined in terms of a single corresponding random variable x i} 


A t = {(o : x t in /J; i = 1, 2, . . . , k, 
then from Eq. 2.90 

P [A x , A 2 , . . . ,A k ] = J J- • • J Px(«) dcL x da 2 da k 


(2.92a) 


h h ik 


= n (2.92b) 

H J *=1 

li 

whenever the {a:,} are statistically independent. 

Similarly, the probability of the intersection of any subset of these events 
factors into the product of the probabilities of the individual events. Thus 
the statistical independence of the {x,\ implies that the set of events 


A x , A 2 , , A k } is also statistically in dependent. 

An interesting example of statistical independence occurs when each of 
k random variables is Gaussian : 


p*M = “7= e 

n /27T 


i = 1, 2, . . . , k. 


(2.93a) 


Px(«) = TI pM) = 7rTi7 2 exp - r I • 


(2.93b) 
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If in the two-dimensional Gaussian density function of Eq. 2.58 the param- 
eter p is set equal to zero, then 

“2) = exp [— K« 1 + 03 


(2.93c) 


Thus the condition p = 0 implies statistical independence. Conversely, 
for p ^ 0 the joint density function does not factor and therefore x t and 
x 2 are not independent. 

Sums of independent random variables. When random variables are 
statistically independent, the form of the probability density function of 
their sum is simplified. For z = x -j- y we have already obtained the 
result 


p»(y) = f Px,v(y - P P) dp 

J—QO 


Substituting p x p y for p XiV in this equation, we have 


Pz(y) = Px(y - P) PviP) dp (2.94) 

J— CO 

Equation 2.94 is the convolution of p x and p y . Using the symbol * to 
denote convolution, we can write, for statistically independent random 
variables, 

P* — Pa *Pv> z = x + y. 

By induction, 

fc k 

P* = Px, * p Xt * • • * * p Xk ; z = 2 x, y P x = IT Pxr (2.95) 

i—l 2=1 

As in the familiar case of signal analysis, it is often easier to calculate 
a &-fold convolution by means of Fourier transforms. We define the 
characteristic function, denoted by Mfv), of a random variable x to be the 
Fourier transform of its density function: 


M.00= p x {«)e m d*. 


!e lvs | = 1 and p x (a) da'= 1, 


* — CO 
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\Mfy)\ < 1 and the characteristic function always exists. The density 
function is regained by the inverse Fourier transform f 

^ r M x (v)e~> va dv. (2.97) 

277 J— co 

It is well known that when functions are convolved their Fourier trans- 
forms multiply. This can be shown by evaluating M z {y) with the use of 
Eq. 2.94: 

M z(P) = f e ,vy p z (y) dy = f e ivy dy f pfy - 0) p y (j 3) dp 

J— CO J— CO J — CO 

= r e"" PM d(S f " p x (r - P) dy 

J—co J~co 

= M„WM,W. 

It follows by induction that for 1c statistically independent random 
variables 

= IT M Xi (v); z = 2 *i, Px = IT Px<, (2-98) 
2=1 2=1 2=1 

from which p z can be calculated by the inverse transformation of Eq. 2.97. 

Mixed Probability Expressions 

In communication problems we frequently consider a sample space O 
on which some events are defined in terms of random variables or vectors 
and some are not. We now develop notation for dealing conveniently 
with such probability systems. Consider two fc-dimensional random 
vectors, x and y, and arbitrary events B and C defined as 

B — {(o: x(a>) in /„} 

C = {w : y(a>) in / 2 ), 

where / L and / 2 are regions of fc-dimensional space. The following dis- 
cussion is general and includes the special case k = 1. In terms of notation 

t When p x is impulsive we use the transform pair 

1 f w 

<5(oc) e'* 1 * dv. = 1, — dv = (5(a). 

2 ”J - CO 
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previously developed we have 

P[B]=J Px (a)rfo t , 

h 

P[C] = Jp,(fS)rff3, 

h 

P[BC]=Jj A >,(3)rfarfp. 

I1I2 

The new notation introduced below is a consistent extension of that 
already encountered. Let A be an event of nonzero probability. 

1 . p x ( | A): The conditional probability of the event B, given the event 
A, is conveniently written 

P[B\A]=jp x (a\A)da. (2.99) 

h 

The function p x ( \ A) is called the conditional density function of x, 
given A. 

In common with all quantities conditioned on an event of nonzero prob- 
ability, p x ( | A) may be regarded as the density function of the random 

vector x under the condition that attention is restricted to those sample 
points that constitute the event A. In effect, A becomes a new sample 
space : all theorems and results valid over O are also valid over A when- 
ever all quantities involved are conditioned on A. Thus conditioning 
density functions on an event of nonzero probability involves no new 
ideas, but only augmented notation. For example, 

P [B, C\ A] = JJ p X 'f a, 0.| A) d(3 da. 

hh 

2. p x ( , A) : The probability of the joint event AB is conveniently 
written 

P[,4B] = j p x ( a, A) da. 
h 

The function p x ( , A ) is called the joint density function of the random 
variable x and the event A. 

Since , 

P[/15] =P[^]P[5U] 

= Jp[^] 7x(« ! A) da, 

lx 
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we have the relation 

p x ( ,^) = P[/(]/ Jx ( 1 4). (2.100) 

3. P[/l|x = a]: The conditional probability of an event A, con- 
ditioned on the event x = a, is defined analogously to the corresponding 
one-dimensional definition of Eq. 2.81a: 

Jpx(a, A) da 

P[,4 | x = a] 4 lim lA- , (2.101) 

A "° JPx(«)^ 

•Ta 

where 

/ A = {a : a — A<a<a + A} 

A = (A, A,..., A). 

If the density functions are continuous at x = a, the limit can be evaluated 
by noting that as A becomes smaller and smaller both the numerator and 
denominator are given to a better and better approximation by the product 
of the appropriate density function, evaluated at x = a, and the volume 
of 7 a . Cancelling this volume in numerator and denominator, we obtain 
in the limit 

P[zl I x = a] = - Px(a ’ A) , (2. 102a) 

Px(a) 

or 

P[4 | x = a]p x (a) = pf a, A). (2.102b) 

Bayes rule. Both Eqs. 2.102b and 2.100 provide expressions for 
p x ( a, A). Equating these two expressions yields the useful result 

P,( a | A) P [A] - P[A | x - a] pf a). (2.103a) 

Equation 2.103a is called the “mixed form” of Bayes rule’, “mixed” 
refers to the fact that the probability expressions involve both random 
variables and events. The two unmixed forms of Bayes rule, from Eqs. 
2.21 and 2.84b, are 

(2.103b) 
(2.103c) 
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Factoring probability expressions. The use of conditional notation 
permits us to factor joint probability expressions with considerable free- 
dom. For example, with three random vectors x, y, and z, we can write 

Px, y,z(“> P» Y) = Px( a )Py(P | x = «)/>*( Y I X = a, y = P) 

= pJPipAr I y = ?)/»*(« | y = P, z = y)» 

and so forth. Similarly, mixed expressions can be factored in many 
different ways such as 

Px. y( a > A , B ) = P[2?] pja I B) P[A | x = a, B]p y ($ \ x = a, A, B ). 

Statistical independence. We have already considered the statistical 
independence of events and the statistical independence of random 
variables. The definitions can be extended in an obvious way to more 
general probability situations. 

1. Two random vectors x and y are defined to be statistically independ- 
ent if and only if 

P*,y=PxPy (2.104a) 

An event B defined exclusively in terms of x and an event C defined 
exclusively in terms of y are statistically independent, 

P[j3C] = P[5] P [C], 

whenever x and y are statistically independent. An alternative expression 
for the independence of x and y is 

Px( | y = (5) = /> x ( ); for all p, (2.104b) 

which we also write in the shortened form, but with identical meaning, 

Px\y = Px- (2.104c) 

We observe that specification of y does not affect the density function of 
x when x and y are independent. 

2. A random vector x and an event A are defined as statistically inde- 
pendent if and only if 

Px{ ,A) = P[A]p x ( ). (2.105) 

Then any event B defined only in terms of x is statistically independent 
of A\ 

P [BA] = P[5] P[,4}. 

3. Two random vectors x and y are defined as statistically independent 
when conditioned on an event A if and only if 

Px,y( | A) = p x ( | A)p y ( \A). 
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Then any event B defined exclusively in terms of x and any event C 
defined exclusively in terms of y satisfy 

P [BC | A] = P[B | A] P[C | A}. 

One important implication of statistical independence is the following : 
consider the transformations 

z i = £i( x ); z 2 = gz(y), (2.107a) 

where g x and g 2 are any two functions mapping the random vectors x 
and y into random variables z x and z 2 . (As a special case, g x and g 2 might 
be the same function.) We now prove that whenever x and y are 
statistically independent so also are z x and z 2 . The statement follows from 
first noting that the events 

B ± {a>: g x (x(w)) < a} 

and 

C = {*>: a y(^) < P] 

are statistically independent, since B is defined exclusively in terms of x 
and C is defined exclusively in terms of y. But 

P) = ns, C) = P[B] P[C ] = F h { a) F h W) (2. 107b) 

for any values a and (3. Thus the joint density function of the random 
variables (z ls %)> obtained by differentiating F Zi Zn in Eq. 2.107b, can be 
factored and the variables are independent. We summarize this result by 
stating that functions of statistically independe nt random vectors (or 
variables) are statistically independe nt. 

A Communication Example 

The concepts and notation of conditional probability, which we have 
seen to be fundamentally the same whether we are dealing with random 
variables or random vectors, are basic to the formulation of communi- 
cation theory. We now illustrate many of the essential ideas by considering 
the idealized one-dimensional communication example illustrated in 
Fig. 2.34. First suppose that there are two possible messages, that is, 
M = 2. One of these two messages, say m 0 or m x , is presented to the trans- 
mitter input, with a priori probabilities P[m 0 ] and P[w x ]. The transmitter 
maps the abstract input symbol into a voltage s, say m 0 —> 5 0 and m x s l3 
which is then applied to the channel input. The channel corrupts the 
transmitted voltage s by the addition of a statistically independent voltage 
n, which has a density function p n . Thus the received signal at the channel 
output is the sum, r, of the random variables s and n. 

We wish to find a decision rule for the receiver, that is, a rule for 



( 2 . 106 ) 
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determining whether the receiver output is to be m 0 or m x , given any value 
of the received voltage. In particular, we seek the (optimum) decision rule 
that minimizes the probability of error. The mathematical problem again 
corresponds to observing the detector output at point a of Fig. 1.8. In 
contrast to the discrete communication model considered earlier, how- 
ever, r is now allowed to be any real number rather than being constrained 
to a discrete set of values. 

Suppose the random voltage r equals p. As in the discrete communi- 
cation example on p. 33, the probability of correct decision is maximized 
by mapping p into that message m i for which the a posteriori probability 


Figure 2.34 A simple communication model. The transmitter input m is one of the 
set of M messages {/»,•}. The transmitter output j is the corresponding member of the 
set of M voltages {*,}. The receiver output m is one of the input set {m,}. 

is maximum ; that is, on observing r = p, we set the receiver output, say 
m(p), equal to m 0 if and only if 

P[m 0 | r = p] > PEm, | r = p]. (2.108) 

We next place Eq. 2.108 in a more convenient form by use of the mixed 
Bayes rule of Eq. 2.103a. Thus m(p) — m 0 if and only if 

p T (p I m„) P[m 0 ] > p T (p I ^i) P[m x ] 

Pr(p ) PM 

or, since the denominator is common to both sides of the inequality, if 
and only if 

Pr(p | m o) P KJ > Pr(p I m i) (2.109) 

We may proceed by noting that r = + n when the transmitted 

message is m f . Thus, conditional on the event that m { is the message input, 
r is obtained from n by the addition of the (known) constant s t . Under 
this condition r = p if and only if n — p — s t . Thus, from the section on 
transformations, 

p T {p | K = p n (p ~ | nti). (2.1 10a) 

Moreover, since the noise is assumed to be independent of the transmitted 
signal, hence of the message, 

Pn(p ~ Si | m t ) = p n {p ~ s^. 




(2.110b) 
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It follows that the optimum receiver sets m{p) = m 0 if and only if • 

Pn(p ~ s 0 ) PK] > p n (p ~ s x ) PK3- (2. 1 1 1) 

The decision rule of Eq. 2.111 may be immediately generalized to 
include the case M > 2. If the possible input messages are m 0 , m x , . . . , 
m M _ x , with corresponding transmitter voltages s 0 , s lt . . . , s M _ x and a 
priori probabilities {P[Wj]}, the optimum receiver again assigns fh{p) as 
the message with maximum a posteriori probability. It follows immedi- 
ately from Eq. 2.111 that m{p) = if and only if 


Pn(p - Si) PK] > Pn(p - Si) PK-]; 

j= 0, I, j ^ i. (2.112) 

If two or more messages have the same a posteriori probability, p may be 
assigned arbitrarily to any one of them without loss of optimality. 

The decision rule of Eq. 2.112 cannot be simplified further without 
introducing a specific noise density function p n . The Gaussian noise case 
in which 

Pn («) = ~~ e- aW (2.113) 

-v/277'cr 

is frequently encountered. The decision rule then becomes: set m(p) = rrii 
if and only if 

P[m 8 ] > p[ m ,.] Ka-^W. j = o, i, . , . , M - 1, j i. 

(2.114) 

This situation is illustrated in Fig. 2.35a for M — 2. From the figure it 
is clear that an equivalent rule is then: assign p to m 0 if and only if 
p > a, where the threshold a is the value of p at which the two curves 
intersect. The location of this threshold, from Eq. 2.114 with M = 2, is 


= s q + gi _j_ ffii In PK] 
2 s 0 — Si P[m 0 ] 


(2.115) 


The optimum receiver output m{p) is determined by Eq. 2.112 for any 
value of M and for any specified noise density function p n . It is helpful 
to view the function m{ ) as partitioning the space of all possible values 
of p into a set of M disjoint decision regions {/J, i = 0, 1, . . . , M — 1. 
For the case illustrated in Fig. 2.35a, /„ is the interval a < p < co and I x 
is the interval ~co < p < a. A case with M — 3 is shown in Fig. 2.356. 

A correct decision results when w t - is the message if and only if the 
received voltage p is in the decision region Ip. letting C denote a correct 
decision, we have r 


P[C I m f ] = p T (p\m t )dp. (2.116) 


ii 


fp] -nrco 4^ P" 7 00 
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The probability of error is therefore 


P[ 8] = P[m 0 ] 


-co ~j27ra 


+ PM f L- e - if, - Sl)i/2a * dp. (2.T 18a) 
Ja A 2sn<j 


Equation 2.118a can be expressed in terms of the function Q{ ) of 
Eq. 2.50 by making the change of variable a = (p — s 0 )jo in the first 
integral and ft = (p — sftfff in the second: then 


P[8] = P[m 0 ].Q 


+ P[«il Q 


(2.118b) 


In the particular case of equally likely messages, P [m 0 ] = P [w x ] — 
a = -|(s 0 + Sj) and the error probability is just Q[(s 0 — ^• 1 )/2cr]. % 

Input probabilities. Before a transmission occurs, the a priori proba- 
bility Pirn,] of each message is known at the receiver. When a voltage 
r = p is received, the a posteriori probability of each message mi at the 
receiver is Pfm 2 1 r = p] and the optimum receiver decides in favor of that 
message for which the a posteriori probability is greatest. The channel 


ler probabilit 


receiver to 


imum receiver 


le in iavor oi 


In the absence of a channel, the “optimum receiver” would always 
decide i n favor of that message whose a priori probability wa s greatest, 
and the probability of error would be maximum if all possible inputs were 
equally probable. A similar statement holds true in general when a channel 
iravaTIaBleT'acciurate communication is most difficult to accomplish when 
the messages are equally likely. 27 

We prove this general statement only for the binary-input, Gaussian 
noise example. First, note that Eq. 2.115 gives the optimum thresh- 
old a for arbitrary a priori probabilities P[m 0 ] and P[mJ ; any' other 
choice of threshold, say b, would increase the probability of error. In 
particular, the choice 


s 0 + s t 


(2.119a) 


increases the probability of error over that given by Eq. 2 >) 118 unless, 
as is the case only when P[m 0 ] = P[wj], b is the optimum threshold. Thus 
the minimum probability of error P[8] of Eq. 2.118 is bounded by 


P[S] < P[m 0 ] P -=L- e - (p - s « f/2a ' 
J — oo \ 2,7 Ttf 


,+PK] -=^e 

Jb yflva 


(2. mb) 
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Simplifying, we have 


+ PM Q 


(2.119c) 


P[£] < PWfi(y + P - Q r^r (2 - 119c) 

The equality holds only when P[m 0 ] = PfrnJ. Thu s the probability of 
error for equally likely binary inputs provides a strict upper bo un d on the; _ 
probability of error for nonequally likely binary inputs and the proof is 


complete. Q 

Choice of signals. Since equally likely input messages are the most 
difficult to communicate, the case of uniform a priori probabilities is an 
interesting one to assume when investigating other aspects of a com- 
munication system. For example, let us next consider how the P[S] in 
Eq. 2.1 18 depends on the signal voltages s 0 and when P[/m 0 ] = PJ/Wj] =£. 
Then the optimum threshold a equals i(j 0 4- Si), and the probability of 
error is given by Eq. 2.119c with the equality. 

It is clear from Eq. 2.119c that the probability of error can be forced 
arbitrarily close to zero by making the difference voltage (s Q — Sj) suffi- 
ciently large. A more interesting (and realistic) situation results when 
there is a constraint on the magnitude of the largest allowable signal, say 

\s t \ < y/¥ b . (2.120a) 

Subject to this constraint, it is clear that (s 0 — sj is maximized by choosing 

s„ = sl%, Ji - -M, (2.120b) 

which yields 

P[S] = QkJEjd*). (2.120c) 

The minimum attainable error probability then depends only on the ratio 
Ejc*. 

We have remarked that the function Q( ) is widely tabulated. For 
large ratios Ejo 2 , a good approximation to the integral is obtained in 
the following way. Consider 


£>(«)= 

Ja f Z7T 

n and integrate by parts. For a > 0 we have 


V2 -Q(«)= 

Ja P 




-i r p/2 ^ 


— o-n* 


a > 0. 





0 1 2 3 4 5 6 


a — >- 

Figure 2.36 The function Q( a) and three bounds. 
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Substitution of Eq. 2.120c in these bounds yields 

1 _ e -m ^( i __ ?-) < p[g] < _ 1 - 

V2t7 y/E b ja 2 ' E J 


Thus the probability of error decreases approximately exponentially with 
increasing Ejff\ Table 2.2 contains some typical values. 


Table 2.2 Binary Error Probability Bounds 
Signal-to-Noise 

Ratio, Ejtfl Lower Bound P[S] Upper Bound 


4 2.02 x 10" 2 2.28 x 10“ 2 2.70 x 10" 2 

10 7.65 x 10“ 4 7.83 x 10" 4 8.50 x 10““ 

20 3.83 x 10- 6 3.87 x 10" 6 4.02 x KT 6 

40 1.27 x 10- 10 1-27 x 10~ 10 1.30 x KT 10 


Another upper bound to Q(u), which will be useful later, is 

Q( a)<**-‘ 2/2 ; a>0. (2.122) 

This bound is also plotted in Fig. 2.36. Proof of Eq. 2.122 is deferred to 
Problem 2.26. 

2.4 EXPECTED VALUE 

Even though random phenomena are unpredictable in detail, we have 
noted that certain average properties exhibit reasonable regularity. An 
empirical average in the real world corresponds to expected value in the 
mathematical model of probability theory. 

As a simple example, consider an experiment that consists of N inde- 
pendent tosses of an ordinary gambling die with faces labeled 1 to 6. 
Let ^ denote the result of the /th toss. Then each ** is some integer between 
1 and 6. The empirical average value of the N results, denoted (z) N , is 
defined as 

<*>N = -“ 2 (2.123) 

N<= i 

The summation in Eq. 2.123 can be rewritten in the following way: 
let N(j) denote the number of tosses that result in the integer j. Then, 
regrouping terms, we have 

<2>n = ■“ 2 jN(J) = X jfM> 

N 3=1 3=1 


THE FUNDAMENTAL THEOREM OF EXPECTATION 85 

where / N (j) =N(j)[N is the relative frequency, defined in Eq. 2.1, of the 
result j. 

Since the x { are random, that is, unpredictable in detail, so also is their 
empirical average <*) N . But when N is large, f H (j) is almost always ob- 
served to stabilize close to some particular number. This number cor- 
responds in the mathematical model to the probability P [j]. Thus, for 
large N , we expect (x) N to stabilize at the number E[tf] given by 

EM=ZjP[j]. (2.124) 

We call E[®] the expected value of the random variable x. 

Equation 2.124 defines the expected value for the particular experiment 
of tossing a die. More generally, we define the expected value of a random 
variable x, with density function p x , as 

(2.125) 

J— 00 

Note that Eq. 2.125 reduces to Eq. 2.124 for 

P»(a) = 2 p [j] ^(« “/>• 

3 = 1 

We shall see in connection with the weak law of large numbers that the 
general definition of E[a] retains the property of being the number 
onto which we expect an empirical average (x) u to converge. The 
expected value of a random variable x is also called its mean value, or 
expectation, and is alternatively denoted x. 

The Fundamental Theorem of Expectation 

In many cases we need to calculate the expected value of a random 
variable x that is defined by means of a transformation on a random 

veCt0ry: * = g(y), (2.126a) 

where g( ) maps every k-dimensional vector into a real number. Although 
E[x] can be calculated from Eq. 2.125 by first calculating p x from the joint 
density p y and the transformation x = g( y), it is often less laborious to 
invoke the theorem of expectation, which states that 
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Equation 2.126b can be written still more concisely as 


* = E[g(y)] = ® ( 2 - 126c > 

An intuitive feeling for the validity of Eq. 2.126 can be gained from an 
outline of its proof. Let us partition the real line (on which x us define ) 
into a large number of small contiguous disjoint intervals of length A, 
as shown in Fig. 2.37. Let /* denote the ith interval [a t - A/2, a t + &I4 J- 
Then 


/*ai+A/2 

P[xinJ*] = pA*)da. 

Jat- A/2 


A l<— 


Figure 2.37 Partitioning the real line into contiguous disjoint intervals. 

We know that the probability of the event {x in /*} can also be written 
in terms of p y : 

Plain/*] = | py(P)d&, 

JBt 

where . , . 

B i = {(3 : a t - A/2 < g((3) < a t + A/2}. 

Since by definition the event {(3 in B,} implies that g((3) a it we have 

('at+A/2 Cai+A/2 . 

ap x (a) da (=« a* P®( a ) da — m ‘<1 

Ja, -A/2 J«i-A/2 

= aj p y ((3)d(3~f g(P)Py(P)^ 

jBi 

in which the approximations are tight for small A. Summing over all i 
yields 

/» co (’af+A/2 

a 4 a p a (a) da = 2 a P«(°0 da 

J — ay co Ja,— A/2 


~2 g(l*)Py(P)^ 

?=— co JSi 


Here the last step follows from the fact that the {5*} are disjoint and their 
union includes all (3 (the function g( ) maps every (3 into some real 
number). The theorem follows from considering the limit as A 0. 
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As an example of the application of the theorem of Eq. 2.126, consider 
the simple one-dimensional transformation a = y 2 . From Eqs. 2.125 and 
2.75, 

EM - f W+V«) + M-V«) J da 

Jo 2^Ja 

f°° 1 — - f" 1 — /— 

= - V a M+ V«) + ;V a V °0 da - 

Jo 2 Jo 2 

Let p = +\/ a in the first integral and P = —A/a in the second. 

Then 


E[x] = p 2 Py (p)dp- P 2 Pv (p) dp 


= P 2 p v (P)dp, 

J— CO 

which is in accord with Eq. 2. 126b. 

We conclude that the expected value of a random variable a is a specific 
number determined by the mapping a( ) from Q into the real line and by 
the probability assignment to events on Q. Equation 2.126 states that the 
value of this number does not depend on whether p x is described explicitly, 
or implicitly in terms of p y and the transformation g(y). 

Linearity. One of the most important properties of expectation is 
linearity. Let x and y be two random variables and consider the linear 
transformation 

z = ax + by. 

The expected value of the new random variable z follows from Eq. 2.126. 


(aa + bp)p x<y ( a, p) da dp 

3 

aa p XtV ( a, p) dp da 


EM- 


+ \ bp p XiV ( a, p) da dp. 

J — CO J — CO 

Integrating out the variable p in the first integral and a in the second, we 
obtain 

E[z] = j* a a p x ( a) da + f bp p u (p) dp 
J— co J—m 

= aE[x] + bE[y], 


z = ax + by. 
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Thus E[ ] can be viewed as a linear operator; that is to say, the expected 
value of a weighted sum is the weighted sum of the expected values: 


E 2 a i x i = J, 

_ i J i 


(2.127) 


This is true whether or not the {x { } are statistically independent. 


Expected value of a product. In general, the expected value of a non- 
linear transformation such as z = xy is not the transformation of the 
expected values ; for example, we have 


E|>] = E[xy] — 


aft p Xi y (a, (5) da. dfi 


which usually cannot be simplified. If, however, x and y are statistically 
independent, p x-y factors and 

E [xy] = f f aft pja) pjp) da. dp. 

J — 00 J — CO 

The integrations on a and ft may be performed separately to yield 

E [xy] = f a pja) da( 0 pjp) d0 = E[*] E[y], 
or 

xy — xy-, x and y statistically independent. (2.128) 

Thus the statistical independence of random variables guarantees that the 
mean of the product is the product of the means. It should be emphasized 
that the converse statement is not necessarily true; xy — xy does not 
usually imply statistical independence of the random variables x and y. 

Moments 

Of particular importance in the sequel are the expected values of the 
powers of a random variable. Whenever the value of the integral is finite, 
we call 


E(V*] = a n pja) da 


the nth moment of x and 


E[(x ~ x) n ] = (a - x) n p x { a) da 


the nth central moment. In the trivial case n = 0 we have 


(2.129a) 


(2.129b) 


E[«°] = E[1J = 1. 
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The second central moment of x is given the special name variance and 
denoted oft. From Eq. 2.127 

aft = E[(x - -r) 2 ] = E[z 2 ] - 2x E[»] + x°- 

= x 2 - x 2 . (2.129c) 

The square root of the variance, a x , is called the standard deviation. 

If we think of a one-dimensional probability distribution p x as analogous 
to a mass distribution along a rod, the moments E[.t”] also have direct 
physical analogs. The mean x corresponds to the center of gravity; a; 2 , to 

Px(a) 


a 

Figure 2.38 The density function of a uniformly distributed random variable. 
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The second moment does not exist, however, since the integral 
~2 _ f 00 J>oc a /ir_ 


J-co b 2 4- a 2 

is not finite. _ 

It was shown in Eq. 2.127 that the mean of a sum of random variables 
{a;-} is the sum of the means, regardless of whether the variables are or are 
not statistically independent. Given that the {x ( } are pairwise statistically 
independent — but not, in general, otherwise — the same statement holds 
true also for the variance of a sum ; by “pairwise statistically independent” 
random variables we mean 


Pw = ; for a11 1 and a11 j * 1 

A general proof is obtained by letting 


where the a , are constants. Then 


N 

y = 2 
1 


(2.132) 


(2.133a) 


a 2 - E[(y - y?) = E 2 a& - 2 = E (2 «*(*< - $) 


= E 2 «/(*»• - ^) 2 + 22 “ *«)(** “ - r i) • 

_i i i*i J 

But Eq. 2.132 states that each term in the double summation above 
involves two statistically independent random variables, the mean of the 
product of which is the product of the means (Eq. 2.128). Thus the 
expected value of the double summation is zero and 

a* - e|2 afa - 30*1 = 2 fliV- (2.133b) 

L z J i 

In particular, 

G y 2 = a 2 o 2 ; for y = ax. (2.134) 

Characteristic functions. In Eq. 2.96 we defined the characteristic 
function of a random variable x as the Fourier transform of the density 
function p x : 

M x (y) ~ f pJWd*- (2.135) 

J— co 

Alternatively, we can view Mfy) in the light of Eq. 2.126 as the expected 
value of the random variable e h ' x . Thus 

M x (v) - E[e ivx ) = 7**. (2.136) 

This interpretation requires that we extend our definition of “random 
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variable” to include mappings from Q into the complex plane, whereas 
heretofore we have considered only mappings into the real line. A 
complex random variable, w, is defined as a pair of mappings such as 

w(co) = x(a)) + j y(a>). (2.137a) 

Similarly, the expected value of w is defined in terms of the expected 
values of the real random variables x and y as 

w = x + \y. (2.137b) 


The probability of any event defined in terms of w can be calculated from 
knowledge of the joint density function p XiV . 

Characteristic functions play a role in probability theory that is equiva- 
lent to that played by Fourier transforms in signal analysis. Particularly, 
many theorems are proved in the transform domain. For example, 
consider again the problem of finding the density function of the sum of 
two statistically independent random variables, say z — x + y. From 
Eqs. 2.136 and 2.128 we have directly 

M z (v) = e’ v<xl ! ' ) = e ivx e ivu = 7^7^= M» M v (v), 
and therefore 

Pz~ Px * Pvt 

which is in accord with Eqs. 2.95 and 2.98. 

An important attribute of characteristic functions is their relation to 
moments. Taking the /7th derivative with respect to v of both sides of 
Eq. 2.135 yields 

■fi MM = f ” MV-" />„(«) doc. . (2.138) 

dv J-co 

Evaluating Eq. 2.138 at v = 0 and denoting the nth derivative by a 
superscript («), we have 

M“( 0) = f " (i a)” p,(x) Ax = 0”)E[a:"]. (2.139) 

J — CO 

Thus, 

M x (0) = 1 

-JM?(0) = * 

-M <2 \ 0) = x 2 ’ 

(-j)” M l x n \ 0) = 7. 


Characteristic function of Gaussian variable. If a; is a Gaussian random 
variable, its moments are easily obtained from its characteristic function. 
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First let 


P#(«) = e 

V 27r 


(2.140a) 


i nen 

r 1 2 1 -V /2 r<V f 1 . .,1 - 

=L e '“ bs e "“ J da “ vs L exp u ‘" |r) J ^ 

Making the change of variable s — a — \v and integrating in the complex 
plane, we have 

-v 2 /2 |'co-iv / i \ 

M > ) = vsLH, exp r2 s ) ds ’ 


where the integration is along a line parallel to the real axis. Consider the 
rectangular contour in Fig. 2.39. Since the function e^ 12 has no poles, 
the integral around the entire contour is zero. Also, as / goes to infinity. 


lm(s) 



Figure 2.39 The contour of integration for evaluating the Gaussian characteristic 
function. 

the integrand evaluated at Re(s) = ±/ goes to zero exponentially as 
e -i2/2 . It follows that the contribution to the contour integral from the 
vertical sides of the rectangle is zero. Thus 


P° -iV e^' 2 ds = r e~^ds = h^ 

J— co— jv -J— oo 

and 

Mfv) = /2 . (2.140b) 

Next consider the random variable y obtained from the Gaussian 
random variable a; by the transformation y = a + bx. From Eq. 2.77, 




\ __ 1 

' V 2nb 2 
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The characteristic function of y follows from Eq. 2.140b: 

M y (v) = e iv( “ +6:c> = e ,va e m)x = e’ Vtt Mfbv) 

= e ivo exp (— J v 2 b 2 ). 

The moments of y are then given by Eq. 2.139. Specifically, we identify 
the mean and variance: 

y = — j(]a — vb 2 ) exp (j va — ?v 2 b 2 ) | v=0 = 

y 2 = —[—b 2 4- (j a — vb 2 ) 2 ] exp (j va — %v 2 b 2 ) | Vl= o = a 2 + b 2 , 

— -y 2 — y* = b 2 . 

In order to place these results in evidence, the density function of y is often 
written in the standard form 

j,„(oc) = -f- (2.141) 

sJ2na v 

The function of Eq. 2.141 is called the general one-dimensional Gaussian 
density function. 

Now consider the sum 

N 

2 « I Vi, 

{= 1 

where the y t are statistically independent Gaussian random variables with 
E[Vi ] = Yi> 

m = <*?, 

hence 

M Vi (v) - exp (~\v\ 2 + j vyj. 

By Eq. 2.98 

Mfv) = XT M vt (v) 


= exp(— \v 2 o 2 4- j vm), (2.142a) 

in which 

a 2 = i a 2 , (2.142b) 

1 

m = (2.142c) 


Noting that Mfy) is the characteristic function of a Gaussian random 
variable with mean m and variance cr 2 , we conclude that the sum of 
statistically independent Gaussian random variables is also Gaussian. 
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We may determine the higher moments of a Gaussian random variable 
by means of a power series expansion of its characteristic function. 
Consider the random variable y with characteristic function 

M y 0) = exp (— £/er/). 

With the help of the expansion 


e a = 1 4- a 4 a 2 + • ■ • 4 — - a" + ■ • * , 

2! n! 


(2.143) , 


we can write 

M , W = 1 - V + ■ ■ ■ + ^ ^ + " ' ' (Z144a) 

Moreover, 

(i v)V . . (j v )V , 

e |VV = 1 + ivy + V-jf- + • • * + — + ' 

so that,f whenever all moments of y are finite, 

M„W = ? i = l + i^ + ^/ + --- + ^>'" + -"- ( 2mb \ 

Equating coefficients of like powers of v in Eqs. 2.144a and b, we have- 
(for a zero-mean Gaussian random variable) 


0; n odd 


y n = \ n\ 

U w/2 («/2)! 

or, more simply, 

(0; n odd 


a”; n even, 


(2.145a) 


In particular. 


~n = nuuu (2.145b) 

l(» - l)(n - 3)(n - 5) • • • (1)< ; n even. 


y* - W 


y 6 = 15 <. 


2.5 LIMIT THEOREMS 

We shall now study several of the limit theorems that form the core of 
probability theory. 

The variance a r 2 of a random variable x in some sense is a measure of 
the variable’s “randomness.” For instance, Eq. 2.130c states that the 

f To obtain Eq. 2.144b rigorously would require proof that the linearity property of 
the expectation operator E[ ] extends to infinite sums. 
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variance of a uniformly distributed random variable is b 2 j 12, where b is 
the width of the density function. Specifying the variance essentiallv ciHT: 
strains the effective width of the density function. F igure 2.40 illustrates 
this effect for the Gaussian density function. 

A precise statement of the constraint is due to Chebyshev. Let y be a 


— !0 -8 -6 -4 -2 


2 4 6 8 10 


Figure 2.40 The Gaussian probability density function for two values of variance. 

zero-mean random variable with finite variance a 2 . Chebyshev' s inequality 
states that for any positive number € 


P[|y| > ; y = o. 

e 

Equation 2.146 can be proved as follows. By definition, 


(2.146) 


y 2 = a 2 p#(a) da. 

J — ’CO 

Since the integrand is positive, 

il 2 > f « 2 /b(«) da.. 


This bound can be weakened further by replacing a 2 with its smallest value, 
e 2 , which yields 


V > e P u («) d<x. 


= * 2 P[|»| > «]. 
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Chebyshev’s inequality follows from dividing by e 2 and recalling that 
Ja = 0 - 2 for a zero-mean variable. 


The Weak Law of Large Numbers 

The simplest of the limit theorems that we shall consider follows 
directly from Chebyshev’s inequality. Consider the sum of N identically 
distributed statistically independent random variables {*<}, each with 
mean x and variance af. Let a new random variable m be defined as 

m = (2.147) 

Nit - 1 

From Eqs. 2.127 and 2.133, the mean and variance of m are 


and 


„ 1 £ _ Nx 

m = — \x i = = x 

Nit [ N 


2 = ±V fy 2 _ No* = a* 
,l N 2 & x ‘ N 2 N 


(2.14Ba) 

(2.148b) 


In order to invoke Chebyshev’s inequality, we define 



Equation 2.149 is a statement of the weak law of large numbers. 

The random variable m is called the sample mean. Equation 2.149 
states that the probability that the sample mean will differ from the true 
mean by more than e approaches zero as N becomes large. 

The weak law of large numbers provides the mathematical justification 
for our earlier interpretation of E[cc], or x, as the number at which 
the empirical average (#) N of the results of N independent experimental 
trials tends to stabilize as N becomes large. We need only identify the 
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independent random variable x t with the result of the ith experimental 
trial, i = 1, . . . , N, and the sample mean m with the empirical average 
/ x \ ’ When N is large, the weak law statement that with high probability 
m is close to the number x is interpreted in the real world as the statement 
that-— barring an atypical sequence of observations— the value of <x) N 
will be close to the number 2. The possibility of observing an atypical 
sequence of trials (one corresponding in the mathematical model to 
| £ - £ | > e) is not ruled out; but, if o* 2 /^ 1 » small, such sequences 

occur rarely. , , , 

An interesting special case of Eq. 2.149 is encountered when each 

random variable x t is defined in terms of an event A t of probability p by 


*i(<») = 


1; for {m: co in 
0; for (co ; a> in Af}. 


(2.150a) 


PI*, = n = PH,] A p, PI*, = 0] = P IA, C ] = 1 - P, (2.150b) 
and, for i = 1, 2, . . . , N, we have 


A — 

Xj — p — X, 
X? = p, 


a* = ~ x? = J>(1 - P)- 

Substituting these values in Eq. 2.149 gives 

P[|m - P\ > «] < ’ 


(2.151) 


Let us now identify the event A t with the result that A is observed on the 
ith trial of a simple experiment. The random variable m then correspon s 
to the relative frequency/,, {A) in a sequence of JV independent trials and 
Eq. 2.151 may be interpreted to mean that PM] is the number on which we 
expect /„M) to converge when N is large. Equation 2.151 is the result 
referred to (Eq. 2.19a) in the discussion of the relation of the mathematical 
model to the real world. 

Chernoff Bound 

Greater insight into the weak law can be gained from a different, more 
graphical derivation. Let / - L 2, . . . , N, be a set of statistically 
independent zero-mean random variables, each of which has the same 


tuf Sections marked by this symbol may be omitted on a first reading. 


98 PROBABILITY THEORY 


density function, say p y , hence the same variance, say a y 2 , which we assume 
to be finite. The weak law then states 

1 A r 1 (T 2 

-fy. ><= (2.152). 

NiZi' ' J IVe 2 . 

We begin the new derivation of Eq. 2.152 by defining a random variable 



f(a) 



(a) 


Figure 2.41 
numbers. 



z through the transformation 


MI 4 


where/( ) is the binary-valued function shown in Fig. 2.41a: 

fO; for ja| < Ne 


/(«) = 


1 ; for ja| > Ne. 


In terms of z we have 


P - 2 Vi > € = !]• 

L Ni=\. 


Since z can take on only the two values 0 and 1, we also have 

ri 1 * 1 

z = 0 • P[z = 0] + 1 • P[z = 1] = P “I Vi > € 

L Ni = i J 


(2.153a) 

(2.153b) 

(2.154a) 
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The expected value of z equals the desired probability. By the theorem on 
expectation, 

* = (2.154b) 

L \i=l 


In general, there is no simple way to evaluate the right-hand side of 
Eq. 2.154b for arbitrary p v . The weak law 'bound, however, can be obtained 
by noting in Fig. 2.416 that 

/(«) < (^J; for all a. (2.155) 


-MkMMkn 


(2.156) 


Since the y { are statistically independent and each has zero mean and 
variance <r y 2 , by Eq. 2.133 


r/ N \2T N 

E 2 Vi) = 

L \ i=l / J i-1 


(2.157) 


Substituting Eqs. 2.157 and 2.156 in Eq. 2.154a yields Eq. 2.152. 

It is obvious from this derivation of Eq. 2.152 that other bounds than 
the weak law can be obtained by using functions other than ( c/./Ne ) 2 to 
bound /(a) in Eq. 2.155. Indeed, for any function g(a) such that 

/(a) < g(a); alia, 

we have 


r i n -i r / N \1 

p “2 Vi > e < E g( 2 Vi) 

L Ni-i J L \<-i /J 


Similarly, if we are interested only in a bound on the positive tail of the 

N 

random variable (1/2V) 2 we ma y bound the one-sided step function 

i—1 

shown in Fig. 2.42a by any function 6(a) and obtain 


rt iV -i r / N \~\ 

= p - 2 Vi> e < E M 2, Vi • 

LNi = i J L \t=i /J 


An especially powerful bound is obtained in this one-sided case if, 
as shown in Fig. 2.426, we take 




X > 0, e > 0. 


We then have 


(2.158) 


< E 6^ 2 &) “ E ex p( A 2 Vi ~ * Ne ) = e XNc E IT eXn ■ 


(2.159a) 
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(b) 


Figure 2,42 Geometric construction for proof of the Chemoff bound. 


Since the random variables [yf are statistically independent, so are the 
random variables {e Xv *}. Thus 

E IT cH = ft E[^ = M A ’ . (2.159b) 

In the last of these equations y denotes any one of the identically dis- 
tributed random variables {yf. Substituting Eq. 2.159b in Eq. 2.159a 
yields 

z < [e Ato “°]‘ V . (2.159c) 

Although the bound of Eq. 2.159c is valid for any X > 0, we should 
choose 1 in such a way that the right-hand side is minimum. We can find 
this optimum choice, A 0 , by differentiating e X{y ~ c) with respect to X and 
equating the derivative to zero : 


0 = — E[e Mv ~%_ ?n = e\— e My ~ c) 

dX 0 L ax Jwc 



= E[(y - e)e Ao<s,-£> ] = e~ x <>‘E[(y - e)e l ° y ]. 
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Canceling e~ Xot and rearranging gives X 0 implicitly, that is, as the solution 
to the equation 

E[ ^°" ] = e. (2.160a) 

E[e x ° v ] 

The bound of Eq. 2.159c then becomes 

z = p[- 1 Vi > el < [T^V; e > 0. (2.160b) 

JV»= i 

It can be shown 33 that A 0 , as given by Eq. 2.160a, is always greater than or 
equal to zero for e > 0 and that X 0 provides the minimum i (rather than 
the maximum). 

The bound of Eqs.. 2.160 is called the Chemoff bound. 16 It can be used 
whenever the numerator and denominator of Eq. 2.160a are finite, which is 
the case for every discrete random variable that takes on a finite number of 
values and for many continuous random variables. Though less easy to 
evaluate than the weak law bound, the Chemoff bound is much more 
powerful: if we define 

X = - In 7^, (2. 161a) 

then Eq. 2.160b becomes 


> e l < * NX ’> 

UVt-i J 


e > 0. 


(2.161b) 


Thus the Chemoff bound decreases exponentially with N, whereas the 
weak law bound decreases only as IfN. Furthermore, it can be shown 34 - 73 
that the exponent X is as large as possible; that is to say, no bound of the 
form 

p|t >«1 < e - NX '; «>o, 

UV.'=1 J 

with X' independent of N, is valid for all N for any X' > X. We say that 
the Chemoff bound is exponentially tight. 

We extend the Chernoff bound to a set of identically distributed, 
independent random variables with nonzero means, = x}, by writing 


y. = Xf “ a; 


Then Eq. 2.160b becomes 


i = 1, .... N. 


p - V *. > X + e < [ e A°x e -A»( 5+OjA’. e > 0, (2.162a) 

LlV*=a J 

in which x denotes any one of the identically distributed variables {xf. 
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From Eq. 2.160a A 0 is given implicitly by 

E[(s-s)^ B ] _g[s^_, (2 . 162b) 

jg[ e Ao(®-!c)] E[e Ao3: ] 

An identical derivation can be performed when e is taken as a negative 
constant. The result is 

priv^^l < [7°*e~ uG+e) ] N ; e < 0, (2.162c) 

In A J 

in which 2 0 , now negative, is again specified implicitly by Eq. 2.162b. 

We can summarize these bounds concisely by defining 


, A _ . 

d — x H- e. 


In terms of d. 


P ~'L x i> d '■> d>x 

[e ;. 0 (*-7>]'V ^ Nt ~^ 

P — 2 x t < d ; d < x 
LN i=i 

with A 0 defined implicitly by 

E[xe x ° x ] = d 
E[e x ° x ] 

Example . As an example of the Chernoff bound, take 
(1, with probability p 


0, with probability 1 — p 


We then have 


E[e A °“] = (1 - p) + pe Xo , 
E[xe x ° x ] = pe\ 

We evaluate 2 0 from Eq. 2.163b: for 0 < d < 1, 


(2.163a) 


(2.163b) 


/= 1,2,..., A. (2.164) 


(1 - p) + P^ a 
d p 


= — P_(i - l) = 

1 — p\d / d( 1 — p) 
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Finally, 

, o = ln 4LzZ), 

Pi 1 - d) 

which is positive if d > p and negative if d < p, as required. 

The bracketed term on the left-hand side of Eq. 2.163a then becomes 

e >. 0 ix-a) _ E^o*] e~ x ° d 

= 1(1 ~P) + pe ? °](e-^) d 

= \ {i _ p) + p ^li\\ pSLzl£T 

\1 — dl .d( 1 — p )_ 


dl \1 - d 


4 


— "[©'(a: 


(2.165a) 


yf a - p 

\dl \l — d 


A e --v-x 


p|"-I \>d>p 


n a' 

p - 2 x i<d 
LN 


0 < d < p 


(2.165b) 

It is helpful to interpret the bound of Eq. 2.165b graphically. Consider 



= — d In p — (1 — d) In (1 — p) + d In d + (1 — d) In (1 — d) 

= T p (d) - Hid), (2.166a) 

where 

Tj,(a) = -a In p — (1 - a) In (1 - p), (2.166b) 

H(a) = —a In a - (1 - a) In (1 - a). (2.166c) 

The function H( ) is called the “binary entropy function.” It is tabu- 
lated 27 and plotted in Fig. 2.43a. 
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H(a) = ~a\na -(1 - a) In (1 - a) 



0 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 
(a) The binary entropy function 



(c) d <p 

Figure 2.43 The geometric determination of the Chernoff exponent, X, for binomial 
random variables. T„(a) is the line tangent to H(a) at the point « = p. X is the dif- 
ference between T v { a) and H(a.) at the point a = d. 

It can be verified that T v ( ) is a linear function of its argument and 
that r„(a) and if(a) are equal and have the same slope at a = p. Thus 
X is given by the geometrical constructions shown in Figs. 2.436, c. Note 
that the exponent X increases as \d — p\ increases. 

An application of the Chernoff bound. An interesting application of 
the Chernoff bound in the binomial case is found in the estimation of the 
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probability of error that results when the discrete communication channel 
shown in Fig. 2.44 is used N times in succession, iVodd, to communicate 
one of two input messages. Thus, if m = m 0 , we transmit a sequence 
of N zeros over the channel; if m = m x , we transmit a sequence of N ones. 
The receiver observes the sequence of N received digits and sets m = m 0 
if the majority are zeros and m — if the majority are ones. An error 
occurs if and only if more than half the digits are received, in error. 

Define a set of random variables as 

1, if the ah digit is not received correctly, 

0, if the ah digit is received correctly. 

If the channel transition probabilities are p and 1 — p and if the 




Figure 2.44 A simple discrete communication channel, called the binary symmetric 
channel. If the channel input is 0, the channel output is 0 with probability (1 - p) and 
is 1 with probability p. The converse statement applies when the channel input is 1. 


occurrence or nonoccurrence of a channel error is statistically independent 
on each use of the channel, the variables {»;*} are identical to those in 
Eq. 2.164. Moreover, the probability of error for the receiver is 

" N N~ 

P[g] = P[m m] = P I *< > - 

-i—l 2 _ 



Thus we may immediately invoke Eq. 2.165b, with d — 1 — d — 
we assume that p = 0.1 and N = 13, the Chernoff bound yields 


P[S]< 



If 


"/0 l \ 0 '^ / 0 9 \ 0,6 " 
Ao.5/ V0.5/ . 
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On the other hand, substituting e = (d — p) = 0.4 in the weak law bound 
yields 


P[«]< 


P(1 ~ P) 
Ne 2 


0. 1(0.9) 
13(0.4) 2 


0.043. 


The comparison between the strength of the two bounds is more dramatic 
if we triple N to 39. The Chernoff bound is then cubed to yield 2.2 x 10~ 9 , 
whereas the weak law bound is divided by three to yield 0.014. 


Central Limit Theorem 

We noted in connection with Fig. 2.7 that the binomial density function 
(that is, the density function of the sample mean 

. 1 M 1 

m A f Xk (2.167) 

M fc=i 

in the particular case for which the (xj are statistically independent binary 
random variables, each with mean x and variance of) exhibits an envelope 
that becomes simultaneously narrower and more bell-shaped as M 
increases. The fact that the envelope becomes narrower is attributable to 
the normalization factor 1/M in Eq. 2.167: as M increases, the mean 
in = x remains constant, whereas the variance o m 2 — ofjM decreases. 
We are interested here in investigating the tendency of the envelope to 
become bell-shaped. Consequently, instead of m, we consider the related 
random variable z defined by 

* - 4= 2 (*< - *). (2.168) 

V N i-1 

With this normalization z — 0 and of = a x \ so that both the mean and 
the variance of 2 remain constant as N increases. The behavior of the 
envelope of p z as N increases is evidenced in Fig. 2.45. 

The bell-shaped tendency illustrated in Fig. 2.45 for the binomial 
distribution is an example of a much more general group of theorems, 
called collectively the centra ! limit theorem, one statement! of which reads 
as follows: 

Let {?/ 2 } denote a set of statistically independent , zero-mean random 
variables , each with the same density function p y . = p., and finite 
variance of. Define 

* = - 7 =J><- (2.169a) 

■\/7V i—l 

f The particular limit theorem stated here is called the Lindeberg-Levy theorem. This 
and other related theorems are discussed in References 30 and 35. 
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-3-2-10123 -3-2-10123 

(a) p = 0.5, a = 0.5, N = 16 (d) p = 0.1, a = 0.3,1V = 16 



(c) p = 0.5, cr = 0.5, N - 400 (f) p = 0.1,<r = 0.3, N - 400 

Figure 2.45 The iV-term binomial density function normalized to zero-mean and 
constant variance. 
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Then, for any a, 

limf» = P -J Ue-^dfS. (2.169b) 

iV-> co J—m -fTirOy 

As a consequence, for any two numbers a and b 

lim (2.170a) 

N-+ oo Ja Ja-fTrrOy 

or, when b — 4- co, 

lim f p z (a) da. = Q f-— Y (2.170b)' 

CO c/ tt 

Since the choice does not affect the right-hand side, the integration interval* 
of Eq. 2.170a may be chosen either to include or exclude the points a 
and b. 

Discussion. The central limit theorem does not imply that p z itself 
approaches the Gaussian density function ; it does imply that the integral 
of p z { a) between fixed limits approaches a value given by the integral of the 
Gaussian density function. The distinction is clear if we consider p v 
to be binomial; for any N, no matter how large, p, is a sum of impulses 
and therefore never approximates the (smooth) Gaussian density function. 

The central limit theorem is operationally useful in estimating such 
probabilities as 

2 y<> a ~\ 


when N is finite but very large and \afo v \ is a relatively small constant 
(independent of N). Quantitative evaluation of the words “very large” 
and “relatively small” depends on the details of the original density 
function p v : if p y itself is Gaussian, the central limit theorem is exact for 
any N and \a/o y \. An equally trivial counterexample is the binomial case: 
if each assumes only the values —1 and 1, and if a is any number 

greater than V N, 





= 0 



In estimating probabilities in which al<7 v grows with N, such as 




-7=2 it* > V^ 6 ], 
1 J 


(2.171a) 
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is dubious, regardless of how large we take N. Consider, for example, a 
set of N binary random variables {a^} in which, for each i, x i assumes the 
values 0 and 1 with equal probability. With y t = x t — cy 2 = |, and 
e = we obtain from Eq. 2.171b 




(2.172a) 


We have already seen (cf., Eq. 2.121) that the Q function behaves ex- 


ponentially as 


Q(a) r*-> e a_/2 ; a > 0. 


Thus Eq. 2.172 implies 


n » 

p 

LNi-i 


(2.172b) 


whereas the exact expression is 


P ri | x, > l] = P [i I = l] = 2-v (2.173) 

UVi-1 J L Ni-i J 

The operational significance of the difference between Eqs. 2.172b and 
2.173 may be extremely significant; indeed the fractional error 

0.5A T 

_£ e +o.i9,v 


grows with N and becomes enormous when N is large. On the other hand, 
it is readily verified that the Chernoff bound agrees with Eq. 2.173, which 
is in accord with our earlier statement that the Chernoff bound is ex- 
ponentially tight . Thus the Chernoff bound should be u^d_i^lieu onhe_ 
central limit calculation in cases such as this, in which the limit of inte- 


. 2.170b increases with N. 


Argument. An appreciation of the validity of the central limit theorem 
can be gained from the following arguments. 

Let M y (y) denote the characteristic function of any one of the N 
identically distributed zero-mean random variables {?/*}, and 5et 
denote the characteristic function of their normalized sum z. Then M z (v) 
and M v (y) are related by 

M.W ^ . ■'[■■■ (f> ))'"! E [j3' sP ( l 7 S< ). 

-W -["•(*)]’ 
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in which we have used the fact that the mean of a product of statistically 
independent random variables is the product of their means. _ 

Now let us assume that p y is such that every moment {y n }, n= 1, 
2, , is finite. Then, in accordance with Eq. 2.144b, M y (v) may be 
expressed in the power-series expansion 

M,{v) = 1 + (MS + ^ 7 + ' ' ' • (2. 175a) 

Since y — 0 and y 2 = o 2 , we have 


M y (v) = 1 4- v*f(v), 


(2.175b) 


where f(y) is a continuous function that approaches the constant (— j y 3 l6) 
as v approaches zero. 

From Eqs. 2.174 and 2.175b, we have 

in M.W - N In = Nln[l- V - a f + (f=) . 

(2.176) 

The logarithm may be expanded in the power series 

In (1 + w) — w — —■ + — ■ ■ • , (2.177) 

which converges for any complex variable w for which |w| < 1. Since we 
are interested in the limit as N -> oo, we may take N sufficiently large that, 
for any fixed v, 

- < 1 . 

W2 + VivrViv/ 

Applying Eq. 2.177 to Eq. 2.176, we have 

'"['-sf+fel'te)] 

v 2 a 2 ( v \ 3 r [ v \ /terms involving the factors) 

-L-wf + M / N + 1 jj- 

Thus, for any finite value of v, 

lim In Mfy) = lim N f (“fe) •+ 

iv-co . n->co L N 2 \JN/ \JNJ 


2 
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Since the exponential function is continuous, it follows that 

lim M z (v) = e ~ vW/2 . (2.178) 

N-+ co 

We recognize that the limiting form of Mfy) is the characteristic function 
of a zero-mean Gaussian random variable with variance af. 

We must now resist the temptation to claim that the density function 


lim p z { a) = lim 

iV-> a> " A'- co 



(2.179) 


is Gaussian. As we have already seen in connection with the binomial 
distribution, such a claim is false! The operations of limit taking and 
integration in Eq. 2.179 cannot, in general, be interchanged. 

Although the density function of z does tend to Gaussian if p y is 
sufficiently smooth, the general central limit theorem statement that the 
distribution function converges to Gaussian form hinges on the additional 
“smoothing” that is introduced by integrating the density function p z to 
get the distribution function F z . 


APPENDIX 2A REVERSIBLE TRANSFORMATION OF 
RANDOM VECTORS 

The change-of-variables transformation considered in Eq. 2.78 is a 
special case of a reversible transformation of vectors. A transformation 
x — > y, with both x and y ^-dimensional vectors, is called reversible if it is 
one-to-one, that is, if the inverse transformation y ->■ x also exists for all 
x and y of interest. For example, let 

y = (A(x)Ja(x) / 7c (x)), (2A.la) 

where each of the {f} is a function of k variables; that is, each ,4- assigns 
a (different) number, say yfai), to a vector x(a>). The transformation is 
reversible if there exists another set of functions (gj such that 

x = (gi(y), g 2 ( y), ••• > &(y)). (2A.ib) 

It is convenient to express Eqs. 2A.l.in the more concise form 

y = f(x) (2 A. 2a) 

x = g(y) = g(f(x)). (2A.2b) 

We now relate p y to p x for a reversible transformation in which the 
partial derivatives dff dx 3 - and dgjdy s exist for all / and j, 1 < i, j < k. 
First we determine the probability distribution function F y and then we 
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differentiate F y to obtain p y . By definition, 


= Jpx(<*)^ a > 


(2 A. 3 a) 


where I is the region 

/ = {a: / 5 (a) < - - - ,/*(«) < &}. (2A.3b) 

Taking the derivative d k l(d^ 9/5 2 • • ■ dfi k ) of the right-hand side of Eq. 
2A.3a to obtain /? y ({3) is complicated by the fact that / is not simply 
expressed in terms of the variables of integration. This difficulty can be 
avoided by making the change of variables 

Y = f(a). (2A.4a) 

Then it follows from the existence of the inverse transform g that 

a = g(Y)- (2A.4b) 

The region of integration / can be expressed simply in terms of y as 

/={ Y : Y <p}. (2A.4c) 

Since g( Y ) may be substituted for a in the integrand of Eq. 2A.3a, the 
only problem in performing the change of variables of Eq. 2A.4a is to 
relate the differential volume elements da and dy. The relationship is 

da = |/ fl ( Y )! dy, (2A.5a) 

where iy ff ( Y )l is the absolute value of the Jacobian J g (y) associated with 
the transformation g. The Jacobian, by definition, is the determinant 


U i) - 


(2A.5b) 


J ii J i- 


with elements 


j = Mv) . I = 1, 2 fc; i = 1. 2, . . . , k. (2A.5c) 

dy. 

With the change of variables of Eq. 2A.4a, Eq. 2A.3a becomes 

' Fy(P) = f Pxt g(Y)l I4(Y)I d Y 
J Y<P 

rpi r p% ri>k , 

• • • PMV)] l^(Y)l dy. (2 A. 6) 

J— GO J — CO J— GO 
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Taking the partial derivative is now trivial, and we obtain the desired 
relation between p y and p x when random vectors y and x are related by 
the 1:1 transformations y = f(x) ; x = g(y) : 

Py(P)~PMmJ o m- (2A.7) 

Further insight into the relation between p x and p y may be gained 
by recalling the fundamental interpretation of the probability density 
function: p x is the function which, when evaluated at a point a and 
multiplied by the volume AK„ of a small region A4 including the point 
a, yields the probability that x will lie in the region. But, if x lies in the 
region A/ a , then y = f(x) must lie in a corresponding region A I v , of 
volume AFj,, which contains the point b = f(a). Thus 


p y (b) AV v = p x ^) 

(2 A. 8a) 

Since a = g(b), we have 


Py(b) A Fj, = ; /? x [g(b)] A V x . 

(2 A. 8b) 

Of course, AF* is not in general equal to AF tf ; : 

indeed, from Eq. 2A.5a, 


(2A.8c) 

Substituting Eq. 2A.8c in Eq. 2A.8b yields 


PyQ>)**pMb)] \JM> 

(2A.8d) 


which is consistent with Eq. 2A.7. 

As an example of the use of Eq. 2A.7, consider the polar transformation 
x -> y given by 

Vi =/i(x) = V x* + x 2 2 . 


«/ 2 =/ 2 (x) = tan 1 -“. 


(2A.9a) 


As shown in Fig. 2A.1, the inverse transformation is 



(2A.9c) 
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Introducing the more natural notation p = (r, 0), so that 

4(P) = r f 

g x (p) = r cos 0, (2A.10a) 

ga(P) = r sin 0, 

we have 

Py(&) = PMP)Wg(&)> 
or 

p y (r, 0 ) = p^(r cos 0, r sin 0) r; r > 0, 0 < 0 < 277. (2A.10b) 


x 2 



Figure 2A.1 The polar transformation. 


For instance, if> x is the two-dimensional Gaussian density function 
1 / ccj 2 - 2/>a 1 oc a + a 2 2 \ „ A 


pM 1 > « 8 >' = 


2wr^? exp l 2(1 - p) m 


). (ZA.lla) 


then, from Eq. 2A.10b, 


/; y (r, 0) = 


2 77 %/ 1 — 


r 2 (l — 2p sin 0 cos 0)"] 

2(1 - p 2 ) J ; 

r > 0, 0 < 0 < 277. 


(2A.llb) 


PROBLEMS 

2.1 Let A, B, C be three events, not necessarily disjoint, defined on a sample 
space £2. Prove the three inequalities stated below, and for each discuss the 
conditions under which the equality sign holds for every legitimate probability 
assignment. ) 

PI^ufiuCK PM] + PM + P[C], 

P (A uB^C]> PM, 

P[/1BC] < PM3- 




PROBLEMS 115 

2.2 Let £2 be the integers 1,2,..., 10 and let each integer be assigned proba- 
bility ro- Define the events A, B, C by 

A = {1, 2, 3, 4, 5 }, 

B = {4, 5, 6, 7, 8 }, 

C = {3, 5, 7, 9, 10}. 

a. Calculate the following probabilities: 

P [A u B c ], PM n Cl P[(A o B) c n C], 

P [(AB) u CJ, P[(^S) C u (AC)]. 

b. What is the total number of distinct events implied by the events £1, A, B, C? 

2.3 Consider the probability system of Problem 2.2. Are the following 
equations true? 

P [A | BC] = P [A], 

P [B | AC] = P[S], 

P[C | AB] = P[CJ. 

Are the three events A, B, and C jointly statistically independent? Are they pair- 
wise statistically independent? 

2.4 Consider the following experiment involving four urns. A ball is chosen 
from urn A, which contains six balls labeled B, three balls labeled C, and three 
balls labeled D. The letter drawn specifies the urn from which a second drawing 
is made. Urn B contains five red and five white balls. Urn C contains four red 
and six white balls. Urn D contains two red and eight white balls. 

a. Construct a sample space and probability assignment that describes the 
experiment. 

b. Given that the second ball drawn is red, what is the conditional proba- 
bility that the first drawing yielded fi? 

c. Are the two events “first ball labeled C" and “second ball red” independent? 

d. Are all results of the first drawing statistically independent of the result of 
the second drawing? 

2.5 Let A and B be two statistically independent events of nonzero probability. 
Prove or disprove the equation 

P [A u B] = PM + P[S], 


2.6 Consider any event A with nonzero probability and a set of disjoint events 
B t , B 2 , , B n such that 

U B t = a 

i=l 

Show that 


P [B k \A] 


P [A 1 B, ; ] PM] 


V PM | B t ] PM-] 


This result is known as Bayes rule (for events). 
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2.7 An experiment consists of throwing a fair die until two successive results 
are the same. Construct a mathematical model that describes the experiment and 

determine the probability of stopping with the nth toss, n « 0 , 1 , 2 , Verify 

that these probabilities sum to one. 

2.8 A communication network with four terminals I, II, III, IV is connected' 
with four links a, b, c, d, as shown in the figure. Not all links, however, are 
necessarily available. Let p denote the probability that any particular link is. 
available and assume that the availability of each link is statistically independent 
of the state of all other links. Two stations can communicate if and only if they 
are connected by at least one chain of available links. 

a. Construct an appropriate probability model with 16 sample points, one 
for each state of the system. 

b. Let A = {a>: I and IV can communicate}. Calculate PM]. . 

c. Let B = {«>: II and III can communicate}. Calculate P[£]. 

d. Calculate P[AB]. How many sample points does this event contain ? Are 
the events A and B statistically independent? 

e. Show that PM] -pB[A\c available] + (1 - p)V[A\c not available]. 
Using this formula, re-evaluate PM] by inspection. 

f. Prove that P [A] would be increased if link c were connected between I and 
III rather than between II and III. 


II 


l<< c >IV 

HI 

Figure P2.8 

2.9 Consider three events A t , B t , and C lt with complements A 2 , B 2 , and C 2 , 
respectively. Prove that A v B x , and Q are statistically independent if and only ' 
if the eight equations 

P MAQ] = PM J P[P#1 P[c*]; Uj, k = 2 

are true. Does any subset of these equations imply the others? If so, determine 
a minimal subset with this property. 

2.10 Consider the communication system described here. The transmitter 
throws one of two fair dice, die I if the message is A and die II if the message 
is B. Die I has five faces labeled A and one face labeled B, whereas die II has five 
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faces labeled B and one face labeled A. The receiver decides the message is that 
shown by the thrown die. Assume the two messages are equally likely. 

a. Construct a suitable probability system and determine the probability of 
error (i.e., the probability that the receiver’s decision is incorrect). 

b. Now assume that the transmitter throws three type I dice if the message 
is A and three type II dice if the message is B. The receiver decides the message 
by majority rule. Repeat part a. 

c. What is the general expression for the probability of error when the trans- 
mitter throws N (N odd) type I or type II dice and the receiver decides by majority 
rule? 

d. The transmitter now throws four type I or type II dice. The receiver again 
decides by majority rule but asks the transmitter to throw the same four dice 
again in case of a tie. This continues until a decision is reached. What is the 
probability of error? What is the probability that the decision will be reached 
with the Mh repetition? 

. 2.11 A noisy discrete communication channel is available. Once each second 
one letter from the three-letter alphabet {a, b, c} can be transmitted and one 
letter from the three-letter alphabet {1, 2, 3}, received. The conditional proba- 
bilities of the various received letters, given the various transmitted letters, are 
specified by the accompanying diagram. 



Figure P2.ll 

A source is available that uses a, b, and c with the following probabilities: 

P [a] = 0.3, 

P[6] = 0.5, 

P[c] - 0.2. 

What is the best receiver decision rule (assignment of I, 2, 3 to a, b , c) and what 
is the resulting probability of error? What is the minimum probability of error 
that could be achieved without use of the channel? 

2.12, Consider the noisy discrete communication channel illustrated by the 
accompanying diagram. 
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Figure P2.12 

a. If P[0] = 0.7 and P[l] = 0.3, determine the optimum decision rule (assign- 
ment of a, b, c to 0, 1) and the resulting probability of error. 

b. There are eight decision rules. Plot the probability of error for each 
decision rule versus P[0] on one graph. 

c. Each decision rule has a maximum, probability of error, which occurs for 
some least favorable a priori probability P[0]. The decision rule which has the 
smallest maximum probability of error is called the minimax decision rule. 
Which of the eight rules is minimax? 

2.13 Let Ai, i = 1,2, ... , K, be a set of disjoint events such that 


a. Prove that, for all a, ji. 


U At ~ £L 


Px.yi^y P) ~ 2 P x < »( a ’ ^ 

2 — 1 

b. Express the following without the use of integrals: 


r r pu+t.*** 

J — CO J— CO 


/>*,„(«» P I A) da dp 


px( a I y = P, A ) PvW I A ) d P’ 


where A is any one of the M f }. 


Threshold 

detector 

(T) 


Figure P2.14 
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2.14 ) Two statistically dependent random variables and x 2 are applied at 
'the inputs to a threshold detector, the output from which is equal to the number 
of inputs that exceed the threshold, say T. Thus y = 0,1, or 2. Determine the 
density function p y in terms of p Xl ,x 2 and T. (See Fig. P2.14.) 

2.15 . Consider the random variable z obtained from a random variable 6 by 
the transformation 


a. Determine p z when 


/>*(«) = 


;■ -2 <1< r 

0; elsewhere. 


b. Determine p z when <f> is a constant, N is an integer, and 
1 

— — ; — Nn + <f> < a < Nn + <j > , 

Pe(oO = l N2 ” 

(); elsewhere. 

2.16, A random variable x with nonimpulsive density function p x is transformed 
into a new random variable y by the transformation 

y =/(*), 

where 

/(«)=F ffi («). 

Here, as usual, F x denotes the distribution function of x. Show that 


Pv(*) = 


1; 0<a<l, 

0; elsewhere. 


2.17 Let a: be a random variable with the density function 

/»*(«) = 2 «“ |a| - 

Determine the density function p v of the random variable 


2.18 A random variable x with probability density function as shown is applied 
at the input of each of the five nonlinear'devices illustrated on p. 120. Calculate 
and plot the resulting probability density p Vi for / = 1,2,3, 4, 5. 
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(d) Amplifier ( e ) Uniform quantizer 

Figure P2.18 

2.19 Let a; and y be statistically independent random variables with the proba- 
bility density functions 


( 1 

1— : 


Pxi«) = j 7rV1 - a 

lo; 

elsewhere. 

Be-W*; 

(3 > 0, 

PM - 0 . 

elsewhere. 


Show that the product z = xy has a Gaussian density function. 

2.20 A noise process is studied and the probability that k “zero-crossings” 
occur in a time interval [0, a] is denoted P (k, a). For example, exactly three 
“zero-crossings” occur in the time interval [0, 3] for the noise waveform pictured. 
For all values of a 

fp(*,a) = l. 

J -=0 
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Let the random variable r denote the time at which the first zero crossing 
occurs in the interval [0, a>]. Express p T in terms of P(k, a). Hint. First calculate 
F r ( a), the probability distribution function of r evaluated at a. 



Figure P2.20 


2.21 A random variable y with density function 

6a-' 1+6 >; a > 1, 

=0; cc < 1, 


is obtained by means of a transformation y 
with density function 


/>«(“) = 



= fix) from the random variable x 

a > 0, 

a < 0. 


Determine a reversible transformation / compatible with the specified density 
functions. 

2.22 Let x and y be statistically independent random variables. New random 
variables u and v are defined by 

u = ax + b, 
v = cy + d, 

where a, b, c, and d are constants. Show that the random variables u and v are 
also statistically independent. 

2.23 A communication system is used to transmit one of two equally likely 
messages, m 0 and The channel output is a continuous random variable r, 
the conditional density functions of which are shown in Fig. P2.23. Determine 
the optimum receiver decision rule and compute the resulting probability of 
error. 


p r (a|mo) p r (a|mi) 




Figure P2.23 
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2.24 y The communication system of Problem 2.23 is now modified by the 
insertion of a quantizer at the channel output as illustrated in Fig. P2.24. 
Determine the optimum decision rule for estimating the transmitted message on 
the basis of the quantizer output r'. Compute the resulting probability of error 
and compare with that obtained without quantization. Hint. Make a discrete 
model for the over-all channel and calculate the transition probabilities. 




r' 



Quantizer 


Figure P2.24 


2.25 A “diversity” communication system employs two channels to transmit 
a voltage ,y to a decision device as shown in Fig. P2.25. Thus the decision device 
has available two received voltages, r 1 and r 2 , in which 

r x = s + r 2 = s + n 2 . 

Assume that and rt 2 are zero-mean Gaussian random variables with variances 
oy 2 and cr 2 2 and that s, n v and n 2 are jointly statistically independent. The 
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Figure P2.25 


system is used to communicate one of two messages m 0 and m 1 with a priori 
probabilities P[w 0 ] and P[mJ. For message m l the signal is 

s~(-l) l VE; 1 = 0,1. 

The optimum decision rule seeks to determine that / for which the a posteriori 
(conditional) probability of m t , given r 1 and r 2 , is maximum. 

a. Determine the structure of the optimum decision device and calculate the 
resulting probability of error. 

b. Compare this result for ay = o\> and P[m 0 ] = P[m,] with the performance 
obtained with an optimum decision based only on r v 

2.26 Derive the inequality 



For what value of a does the equality hold? For what values of a is this bound 
tighter than the inequality 

G(«) < ~Lr- ? 

V2na 

For what values of a are the two bounds both within 10% of the true value of 
(2(a) ? Hint. Identify [Q(a)] 2 as the probability that a pair x, y of independent 
zero-mean, unit-variance Gaussian random variables lies within the shaded 
region of (a) in Fig. P2.26. Observe that this probability is exceeded by the 


y y 



(a) (b) 

Figure P2.26 
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probability that x, y lies within the shaded region of (b). Evaluate this last 
probability by means of a change of variables. 

2.27 Let y and z be two random variables such that £[(y — z) 2 ] = 0. Define 
the event 

A = {co: y(a>) ^ z(a>)} 

and evaluate P[/4]. 

2.28 Let x and y be two random variables with finite second moments and 
define 

z = x + y. 

Prove that 

of + a f - 2 a x o y < of < of + of + 2o x o y , 

hence 

of < 2 (of + of). 

2.29 Let x be any random variable for which the two conditions 

pff) = 0; a < 0 


J — 00 


.(a) da < co 


are met. Prove that for all k > 0 

1 

P[.r > kx] < - . 

Is there an acceptable density function p x for which the equality holds true? 

2.30 Let x %, . . . , x n be a set of N identically distributed statistically in- 
dependent random variables, each with density function p x and distribution 
function F x . As shown in Fig. P2.30, these variables are applied as the inputs 
to a box that selects as its output, y N , the largest of the {*»•}. Clearly, y N is a, 
random variable. 

a. Express p Vs in terms of N, p x , and F x . 



y N = max jxi, X 2 , " m , 


Figure P2.30 
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b. Assume that the x t are exponentially distributed random variables: 


pfp) = 


Calculate the expectation y N for N = 1, 2. 

Discussion. It can be shown that the general expression for y N in part b is 

- V 1 
VN = Z 7 • 

5=1 J 

In certain communication situations involving diversity transmission over 
independent Rayleigh-fading paths (see Chapter 7) would represent the 
energy received over the it h path. (Show that the square of a Rayleigh random 
variable is exponentially distributed.) A “selection diversity” receiver selects 
the path with the largest energy. We observe that the mean energy of this path, 
y N> increases to infinity as N -* co. The incremental advantage of adding 
another diversity channel, however, decreases rapidly as N becomes large. 


2.31 Let a: be a random variable with mean a;, variance of, and characteristic 
function Mfy). Define 

y = ax + b; a and b constants. 


Determine y, of, and Mfy) in terms of x, of, Mfy), a, and b. 

2.32 Let x and y be statistically independent random variables with the proba- 
bility density functions pf<x) = p y («) = 

a. Calculate the «th moment, Ef* n ], of a. 

b. Calculate the «th absolute moment, of cc. 

c. Determine the characteristic function of x. 

d. Determine and plot the probability density of the sum z = x + y. 

e. Calculate the probability that x is greater than y. 

2.33 A random variable x has characteristic function 


a. Evaluate the constant k. 

b. Calculate x. 

c. If y = x + m, where m is a constant, calculate Mfy). 

2.34 A random variable a; which takes on only integer values is said to be 
“Poisson distributed” if 

® l m e~ l 

pf“) = 2 <5(« - m) — 7 - . 

m—0 

Plot p a for A = 2, m < 5. 

a. Find x, of, and Mfy). 

b. Let x and y be statistically independent Poisson variables with constants 
2 X and 2 V . Define z = x + y. Express p z , z, and of in terms of 2 X and X y . 
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2.35 ' The random variables x l and x 2 have the joint density function 

Px v x^- 1 , «*) “ 7 + ai 2 )(ia 2 + aa 2) ’ hi* b * > °- 

a. Show that x 1 and » 2 are statistically independent random variables with. 

Cauchy density functions (see Eq. 2.48a). • . 

b. Prove that M X] (y) = e _t>l|v| . 

c. Define y = % t + x 2 . Determine p v . 

d. Let (zj be a set of N statistically independent Cauchy random variables 
with b t = b, i = 1, 2, . . . , N. Define 

_ 1 V 

"“iv,?. 2 '" 

Determine p z . Is z— which is called the sample mean— a good estimate of the 
true mean A 

m = Zi. 

2.36 Let {»*} be a set of N statistically independent random variables and 
define s 

y =* 2 *<■ 

i-1 

a. Which of the following statements are always true? Prove or give a 
counter example. ^ 

(i) 

»=i 

_ N 

(ii) 

<= i 

_ N 

Oii) y 3 -2*** 


M V (V) = ^ M Xi (.V). 

i—1 

N 

< - 2 


(vi) [?/ - 3 y*y + 2y 2 ] = 2 W ~ 3 x? x t + 2*,"]. 

1=1 

The combination in brackets in (vi) is called the third semi-invariant moment. 
In part b we consider the general form of semi-invariant moments. 

b. Define y v (v) = In M v (y) and /**(*) * 111 M^v). 


(i) Show that 


P v (?) = 2 /*&>• 
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Assuming that all moments exist, we may expand p y (y) and p t (v) in a Taylor 
series : 

co v i 

y*( v ) = 2 a i 7T 

} = 0 J- 

CO yf 

yl v ) = 2 a u t? • 

3=0 J 1 

Equating coefficients of equal powers of v 7 we obtain 

N 

Qj = 2 a ii' 

Thus the yth coefficient in the expansion for a sum of N independent random 
variables is the sum of the y'th coefficients of the constituents. The y'th coefficient 
is called the /th semi-invariant moment and /x y is called the semi-invariant 
moment generating function. 

(ii) Evaluate a 0 , a x , a 2 , and a 3 in terms of the moments of y. 

2.37 a. Prove that 

(x - x) 4 

P[\x-x\ > «] <- e4 

for any random variable a; with x 4 < co. 

b. Assume that a; is a Gaussian random variable and let e = ko. Determine 
the range of k for which the bound of part a is tighter than the Chebyshev bound. 

c. Why is the bound of part a not so useful as the Chebyshev bound in proving 
the weak law of large numbers? Hint. Show that, in general, 


i (x - xf # 2 - y^’ 

<=i 

N 

where x = 2 Vi ar) d the are statistically independent identically distributed 

i=l 

random variables. 

2.38 We wish to simulate a communication system on a digital computer and 
estimate the error probability by measuring the relative frequency of error. 
Approximate by means of 

a. the Chebyshev inequality, 

b. the central limit theorem, 

c. the Chernoff bound, 

how many independent uses of the chanrfel we must simulate in order to be 
99.9% certain that the observed relative frequency lies within 5% of the true 
P[£], which we may assume is4ess-than 0.01. 

2.39 One of the two equally likely messages is transmitted over a noisy channel 
by means of the following strategy. If m 0 is the message, the transmitter sends 
a sequence of N voltage pulses over the channel, each with amplitude ^ E. If 
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m, is the message, N voltage pulses with amplitude - Ve are sent. The effect of 
the channel is to add a (different) statistically independent Gaussian random 
variable to each amplitude. Thus the channel output is a sequence of N 
amplitudes 

r { = j i = 1, 2, . . . , N, 

where s = + ^E if the message is m 0 and 5 = - ^ E otherwise. Assume for 

all i that — . 

^ = 0; n ? = a 2 . 

a. The receiver calculates n 

y 

i = 1 

and sets m = m 0 if and only if y > 0. [We shall see in Chapter 4 that such a 
receiver is optimum.] Determine the resulting P[£] and show that 

P[£] < e - NEl2a \ 


b. A suboptimum receiver makes an independent binary decision about m on 
the basis of each r, in turn. Let p denote the minimum attainable probability 
that any such decision is wrong. Obtain an expression for p. The receiver then 

forms the sum n 

* =Z x i> 

i - 1 

where x t = -1 if the ith decision favors m x and x t = +1 if the /th decision 
favors m Q . Thus P [*< = 1 I mj - p for all i independently. The receiver sets 
m = /m„ if a.- > 0 and m = m, if * < 0. Use Chernoff bounding techniques to 

show that 

P[S] < »)]• 


Show that 

Ve/Iito* 


when Eja 2 is very small, so that the probability of error bound in part b then 

becomes . P M<r ur^Wr>. 


Comparing this bound with that of part a, we see that the effect of making 
individual decisions is to multiply E by the factor Ifr -2 db). 



3 


Random Waveforms 

/ 


We have considered so far how to make the optimum decision about 
which message is transmitted when a receiver’s front end and detector are 
fixed. With reference to Fig. 3.1, this problem involves limiting our 
observation of what is received to a quantized voltage sample at point a, 
or to an unquantized voltage sample at point a'. The decision rules 



Figure 3.1 A communication system model with a finite number of input messages 
{/Ml), j‘ =a 0, 1, . . . , M — 1. 


established in Chapter 2 are therefore optimum only in the limited sense 
that they produce the smallest possible probability of error, given that 
we cannot redesign the way in which the receiver obtains the voltage at 
point. o' from the random received waveform r(t). To design a receiver 
that is optimum in an over-all sense, we must focus back on points b and 
d and investigate the problem of dealing directly with random time 
functions, instead of just with random voltages. 

In this chapter we consider those mathematical aspects of random wave- 
forms that are essential to our study of this problem. As in Chapter 2, 
the primary objective is to develop the engineering import. 

3.1 RANDOM PROCESSES 

The appropriate mathematical model for dealing with unpredictable 
voltages involves the concept of a random variable , x, defined as the 
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assignment of a real number x(w) to each point oj of a sample space in 
such a way that a probability distribution function F x exists. Unpre- 
dictable waveforms are dealt with in a similar way: instead of only a 
single number, however, we now assign to each point co in O a real time 
function, say x(co, t). The situation is illustrated in Fig. 3.2, which shows 



Figure 3.2 A simple random process. 


a finite sample space O with four points, and four waveforms, labeled 
x(u)i, t); i = 1, 2, 3, 4. 

Now let us think of observing this set of waveforms at some time 
instant, t = t l7 as shown in the figure. Since each point co i of Q. has 
associated with it both a number x(t o i7 Z x ) and a probability , the collection 
of numbers {x((Qj, Z x )}, i — 1, 2, 3, 4, forms a random variable . Observing 
the waveforms at a second time instant, say t 2 , yields a different collection 
of numbers, hence a different random variable. Indeed, this set of four 
waveforms defines a (different) random variable for each choice of 
observation instant. 

In general, we are interested in the case, in which O contains an infinite 
number of points co, and the set of waveforms {x{m, z)} is correspondingly 
rich. For example, (x(eo, ?)} might include every real waveform defined 
on [— oo < t < co]. Just as with finite Q, we presume tha t the collection 
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of waveforms, together with the probability assignment, defines a ra ndom 
variable for each choice of observation instant. 

- - — . m maa icmscs’ca 

The probability system composed of sample space, set of waveforms, 
and probability measure is called a random process and is denoted by a 
symbol such as x(z). The individual waveforms of a random process 
x(t) are called sample functions, and the particular sample function 
associated with the point co is denoted x(co, t). Naming a random process 
x(t) and denoting the sample function associated with the point co as 
x(ca , t) corresponds to our previous practice of naming a random variable 
x and denoting the sample value associated with the point co as x(a>). 
We shall find it convenient to use the notation x(t) in two different senses: 
first (as above) to denote the random process and second to denote the 
random variable obtained by observing the random process at time t. 
Whenever the sense is not clear from the context, we write x(t { ) or x^to 
denote the random variable observed at time Z f . 

Interpretation of the Random-Process Model 

Let us consider how the random-process model enters into the problem 
of designing a receiver such as that diagrammed in Fig. 3.1. First, which 
of the set of possible transmitter waveforms {■$*(/)} is actually transmitted 
depends on the random input message m. We note immediately that the 
signal s(t) is a random process with a finite number of sample functions; 
the probability that s(t) equals 5*(z) is P[mJ. 

Next, consider the channel. Let us assume that nature, in some way 
that we can describe only probabilistically, selects one member of a set 
containing all possible disturbing waveforms and adds it to s(t). The 
appropriate mathematical model then involves a sample space O on which 
three random processes and the random input message are defined 
simultaneously: associated with any particular sample point co f is a 
message, say m it the transmitter signal s(a>, t ) = sft), one of the possible 
noise waveforms, say n(co, t ), and the received waveform 

r(cn, t) = j(w, t) + n(u ), t). (3.1a) 

Since Eq. 3.1a holds for every point co, we write 

r{t) = 5(0 + «(/), (3.1b) 

where r{t), s{t), and n{t) are random processes. Over Q the entire set 
{r(co, 0} exhausts all possible pairs of noise and signal waveforms. 

The problem confronting us when designing the receiver illustrated in 
Fig. 3.1 may now be stated. We look on the random process r(t) as a 
black box (encompassing the message source, transmitter, and channel) 
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at whose output terminals one of the time functions r(a>, t ) appears. In 
effect, some hidden mechanism within the box selects a point a) at random 
and emits the corresponding sample function. The receiver must operate 
on this sample function — whichever one it may be— in some fixed way to 
produce an estimate of the message. The crux of the problem is that we 
must specify the receiver operations in advance, whereas we cannot know 
in advance which sample function will appear. 

Given a fixed receiver design, it is clear that some of the r{on, t) will lead 
to a correct estimate of the message and some will not. Let us denote 
the set of sample points co that leads to an incorrect estimate by the 
symbol 8: 

8 = {to : r(co, t ) => error}. 

Our objective is to design the receiver in such a way that the probability 
of this event, that is, the probability of error P[8],will be minimum. The 
subject of Chapter 4 is how to design such a receiver in certain important 
cases. First, however, we must develop appropriate mathematical 
techniques for doing so. Accordingly, we now return to the discussion 
of random processes. 

Random Vectors Obtained from Random Processes 

By definition, a random process implies the existence of an infinite 
number of random variables, one for each t in the range — co < / < oo. 
Thus we may speak of the probability density function p xiti) of the random 
variable x(t t ) obtained by observing the random process x(t) at time t x . 
More generally, for k time instants h, t 2 , . . . , t k we define the k random 
variables x(r 1 ), x(t 2 ), . . . , s(r fc ) and denote their joint density function by 
p x(t) , in which we introduce the notation 

x(t) = (»0i), , *(Q). (3.2a) 

The components of the random vector x(t) associated with any particular 
sample point co are the values of the sample function x(o>, 0 observed 
at times ti, t 2 , . . . , t k \ 

x(ct), t) = h), x(co, ti), . . . , x(a>, t k )). (3.2b) 

Note that the density function p xW depends on the random process x(t) 
and the specific time instants 

As an application consider the probability of obtaining a waveform 
that passes through a set of k “windows”, as in Fig. 3.3; that is, the 
probability of the event 

A = {co: a { < x(co, /*) < bp, i — 1, . . . , k}. (3.3a) 


Possible sample 
function 


t 


Figure 3.3 The probability of the event {co: a x < x(o>, t x ) < b u a 2 < x{co, / 2 ) < 
b z , a 3 < x(co, t 3 ) < b 3 ) is the probability of the set of sample functions which pass 
through the windows. 

This probability is 

r*>i rb k 

P[A] = ' ' ' Px(t)( a ) ^ a - ( 3 - 3b ) 

•la i •'as Vat 

In a similar way, we can calculate the probability of any event defined in 
terms of a finite number of time instants.f 

Specification of Random Processes 

We say that a random proces s x( t) is spe cified if and o nly if a rule is 
given or ^~im]^e^^ joint" probability density function 

“inapplcation, we enco^T^tf^^’metlmSsm specification. The first 
and simplest is to state the rule directly. For this to be practical the 
joint density function must depend in a known, elementary way on the 
time instants. An important example of this method is the Gaussian 
process, on which we shall concentrate after discussing filtered impulse 
noise. 

For the second method a time function involving one or more parameters 
is given; for example, 

g{t) = r sin (l-nt + 6). (3.4) 

The parameters r and 6 are then taken to be random variables, with a 
specified joint density function p Ti0 . The sample functions of the random 
process x(t) are then 

x(co, t) = r(co) sin [2 ttT + 0(eo)]; all co in Li, (3.5) 

f It is not possible in general to calculate directly the probability of events defined in 
terms of an infinite number of time instants, such as the event B defined by 

B = {co: x(co, 0 < 0 for all t in the interval [0, 11). 

Probabilities of this sort can be calculated only indirectly in the limit as k becomes 
infinite of expressions similar to Eq. 3.3b. 
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and the sample space is that on which r and 6 are defined. Any density 
function p x(t) can then be derived from knowledge of p Tw0 , although the 
calculations may be difficult and tedious. 

One possible association of waveforms with sample points for the 
random process of Eq. 3.5 is illustrated in Fig. 3.4, in which Q is taken to 
be the two-dimensional plane and the numbers r(w) and 6(a>) are taken to 



Figure 3.4 Random variables r and 6 defined as the polar coordinates of each sample 
point co when Q is a plane. 

be the location of the point co in polar coordinates. A possible joint prob- 
ability density function is 


l PL e -« 2 iz ; 0 < a < co, 0 < 0 < 2ir, 

Pr/*>P) = 2?T 

\0; otherwise. 


(3.6) 


The third method of specifying a random process is to generate its 
ensemble by applying a stated operation to the sample functions of a known 
process. A trivial example is the definition of a new process, say y{t), as 
the time translate of a given process, say x(t): 


y(m, t) = x(o), t + T); all co in Q. (3.7) 


In this case any density function for the new process may be written 
immediately in terms of the corresponding (known) density function of 
the original; the random vector 

y(t) = (y(/i), y(h)> • • ■ > »(**)) (3,8a) - 


is equal for every sample point to the random vector x(t -1- T), where 

x(t + T) = (x{h + T), x(u + T), . . . , x(t k + T)). (3.8b) 

ThUa y(t) — x(t + T) (3-8c) 
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Py CO — PxCt+ry 


A more interesting example of the third „ZZm°5 into 

method, and one that we shall often encounter, y{t) by passing each sample 
is linear filtering. A new random process y(t) function x{<a, t ) through the 
can be obtained by passing the sample func- linear filter, 
tions of a given process, a;(0, through a linear 

filter with impulse response h(t), as shown in Fig. 3.5. The sample functions 
of the two processes are then related by the convolution integral 

y(a> y /) = h(t — a) x(cq, a) da; all o) in (3.10a) 


(3.10b) 


More concisely, we write 

y(t) = (* h(t — a) x(a) da. 


In general, it is difficult to determine a density function for the process 
y(t) from knowledge of the density functions of the process a#), although 
when x(t) is a Gaussian process we shall see that doing so is simple. 


Stationary Random Processes 

In dealing with random waveforms in the real world, we often notice 
that statistical properties of interest are relatively independent ol the time 
at which observation of the waveform is begun. For example, the empirical 
average of N consecutive samples taken at I-sec intervals may be 
insensitive to the precise time at which the first sample is taken. 

A stationary random process is defined as one for which all density 
functions are independent of absolute time reference (time origin). Thus 
a process x{t) is stationary if, for every finite set of time instants {/<}, i = 1, 
2, ... ,k, and for every constant, T, 

Pxit+T) = P x(t) * (3-11) 

-The notation is that defined in Eq. 3.8.^ 

; One implication of stationariness is that the probability of the set of 
sample functions which passes through the windows of Fig. 3.6 is equal to 
the probability of the set of sample functions which passes through the 
corresponding time-translated windows. It is not necessarily true that 
the two sets consist of the same sample functions. 

A second implication of stationariness is that ensemble averages can be 
associated with the entire process rather than only with the process 
evaluated at some particular instant of time. For example, a stationary 

<vA&- (yd- £j0 ft 
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Figure 3.6 Three windows and their time-translates (?/ = U + T\ i = 1, 2, 3). 

process rc(z) can be said to have a mean value x or a second moment x 2 
without specifying the instant of observation t: 

EK01 = *; all t (3.12a) 

E[a; 2 (z)] = x 2 ; all t. (3.12b) 

That this is so follows directly from Eq. 3.11. For any two observation 
instants, and Z 2) we have 

PatUp ~ Px(t 2 ) 

and thus for any n 

= J oc” p a(il) (a) da = J_j*" •&»«»>(*) do. 

= E [x n (t 2 )l (3.12c) 

More generally, it follows from the theorem of expectation (Eq. 2.126) 
that the average of any time-invariant function g defined on k samples 
from a stationary process x(t) is independent of time origin : 

g(x(t)) = J g( a) p x<t) (a) da = J g(a) p x(t+T) ( a) da 

= g(x(t + T)). (3.1 2d) 

A simple example of a stationary random process is the ensemble of 
waveforms {f(t + t)} generated from the periodic ramp f(t) shown in 
Fig. 3.7 by taking r to be a random variable with the uniform density 
function 

f— : 0 < B < T, 

v T m=\T ( 3 - 13 > 


. 0 ; 


elsewhere, 
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Px(t i) ~ Px(t,.+ T)- 


The sample function x(co 0 , t ) is illustrated in Fig. 3.8 for a sample point 
co 0 such that r(co 0 ) = r 0 . Over O. the random variable t takes on every 
value in the interval [0, T \ ; correspondingly, the random variable x(t x ) 
takes on every value in the interval [0, a]. The transformation from r to 
sfo) is shown in Fig. 3.9. 



X (tl) 



Figure 3.9 The transformation relating the random variables r and x(t,) is obtained 
from Fig. 3.7 by holding f x fixed and shifting f(t ) to the left by the amount t. 
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To determine the distribution function of aOi)> consider first the event 
{r: x(t^ < a), in which a is as shown in Fig. 3.10a. The probability of 
this event is just the probability that r will lie in the shaded interval, /q. 




X (t\) 



Figure 3.10 The geometrical relations determining F*, (l) (a); the interval / 0 is shaded 
in (a) and the intervals I x and h are shaded in (6). 

Next consider the event {r: x(tj) < a), where a is as shown in Fig. 
3,106. It is clear that this event will occur if and only if r lies in one of the 
two intervals f and J 2 , with lengths Tj and T 2 , respectively. Thus, 

a 

a ' 


v , , T, + T, 
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Since for any a, 0 < a < a, the event {r: x(td < a) can occur only as 
shown either in Fig. 3.10a or b, we have 

(0; a < 0, 

W*)= 5; 

Vl; a < a. 

Differentiating with respect to <x yields 


0 < a ^ a, 


’ (314) 

1,0; elsewhere. 

Finally, we observe that the derivation of Eq. 3.14 is independent of the 
observation instant t x and therefore must yield the same result when 
carried through for x(t l + T). Thus 

Pxih+T) = Px(h) Tand t v 

We complete the proof that x(t) is stationary by considering the density 
functions p x(t) and /> x(t+T) of the random vectors 

x(t) = (*(/!>, x(t 2 ), .... x(t k )), 

x(t 4- T) — (afo + T), x(t 2 + T) x{t k + T)y 

It is clear from Fig. 3.8 that the knowledge that aO x ) = a x uniquely 
specifies the value of the random variable r, hence uniquely specifies the 
sample function of the process x(t ) =f(t 4- r) being observed. Given 
that xfa) — a x , the value a i that will be observed at time is therefore not 
random but depends only on a x and the time difference (/, — t-d- from 
Fig. 3.8, , , 

(3.15a) 

Hence the conditional density function 

pxiu)( a i I x (h) = «i) = d ( a i - a i) 
is independent of the time origin . Similarly, for any T and t it 

PxUi+T^i | x (h + T) = a x ) = <5(a f — ad; i = 2, 3, . . . , k, 
where is the value specified by Eq. 3.15a. It follows immediately that 
for any t = On t 2 , . . . , Q 

k 

Px(»>(«) — Px(to( a l) IT 5 ( a i ~ a i ) = Px(t+T)(«). 

i = 2 

Thus x(t) is stationary. 


(3.15b) 
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The method of analysis used in the foregoing derivation is not restricted 
to the simple ramp /(?) but can be applied to any periodic function g(t). 
Define the random process y(t ) as the ensemble of waveforms {»(/ + r)}, 
where r is again uniformly distributed over the period T of g(t) : 

y(t) = g(t + r) (3.16a) 



0 < < T, 

elsewhere. 


(3.16b) 


We now use an equivalent but less detailed argument to show that y(t) 
is stationary. 



Figure 3.11 A set of barriers and a waveform that passes under them. 


For the proof we introduce the translated process 

*(<) = y(t + T) = g(t + T+r) (3.17a) 

and the random vectors 

z(t) = (z(/ x ), z(t 2 ), . . . , ?(/*)) 

= Wx + T), y(t z + T), . . . , y(t k + T)) 

= y(t + T). (3.17b) 

By definition, the process y{t) is stationary if F m = F z(i) for all T and 
all sets of observation instants {/,■}. These distribution functions are 
equal whenever 

y(t) < a}] = P[{t: z(t) < a}]; all a; (3.18) 

that is, whenever the probability of the set of sample functions of y(t) 
passing under arbitrary barriers such as those of Fig. 3.11 equals the 
probability of the set of sample functions of z(t) passing under these 
barriers. The following arguments show that Eq. 3.18 is valid. 
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We note first that if r/ is determined from t x and T by the construction 
shown in Fig. 3.12, the periodicity of g(t) implies that g(t + rj) and 
g(t + T + t/) are the same waveform for any in the interval [0, T]. 
There is a one-to-one correspondence between the sample functions of 
2/(0 and z(t). 


t = 0, T 



Figure 3.12 The values t in the interval [0, T] are mapped onto a circle of circumference 
T. The point r/ is that point in the interval [0,7 ] obtained, by starting at r, and 
moving a distance T counterclockwise around the perimeter. 

Next, it is convenient to indicate by heavy arcs the intervals of t cor- 
responding to those sample functions (g(r + t)} of y(t) that pass under 
the barriers of Fig. 3.11. A typical situation might be the one shown in 
Fig. 3.13c. Arcs may also be used, as illustrated in Fig. 3.1 3&, to indicate 
the intervals of r corresponding to sample functions {g(t + T + r)} of 
z(0 that pass under these barriers. In accordance with the construction 
of Fig. 3.12, we note that Fig. 3.136 is always obtainable from Fig. 3.13c 
by rotating the arcs. 

From Eq. 3.16b, the probability that r lies in any collection of arcs is 
equal to the total length of the collection. Since rotation does not affect 


t = 0, T t = 0,T 



(a) (b) 


Figure 3.13 The arcs denote the values of t for which (a) sample functions of y{t) and 
(6) sample functions of z{t) pass under the barriers of Fig. 3.11, 
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arc length, Eq. 3.18 is valid, and 


^y(t) — ^<t) 


Py(t) = Py(t+T>- 

This concludes the proof that the random process y{t) is stationary for 
any periodic function g(t). 

A straightforward extension of the foregoing result involves a periodic 
time function g(w, /) specified in terms of a set of J parameters {w,}, none , 
of which affect the period T. Consider the process 


in which we let 


x(t) == g( w, t + r), 
w = (w u w 2 , . . . , wj) 


(3.20a) 

(3.20b) 


be a random vector and p T again be uniform over [0, T\. If the random 
variables {w,-} are independent of t, that is, if 


we have 


PwT Pv/Pt » 


Px ct)(a) = Px<o( a I W = Y) P«(y) dy. 


(3.20c) 


Since, by Eq. 3.19,p x(t) ( | w = y) is independent of the time origin, so 
also is p x(t) . By taking both the period T and the number of components 
in w to be very large, we can generate a rich variety of stationary processes. 

As an example, we apply this result to the process obtained from the 
periodic function sin 2-rrt by letting the amplitude and phase be random 
variables : 

x(t ) — r sin (277/ + 0). (3.22a) 

Here, as in Eq. 3.6, we may choose 

(— 0 < a L , 0 < a 2 < 2 t7, 

Pr>i,«.)= 277 (3.22b) 

(0; elsewhere. 

Defining r = 0/277 then yields 

«(/) = r sin 277 (/ -f r). (3.23) 

We observe from Eq. 3.22b that r and 6 are statistically independent, 
which implies the independence of r and r. Moreover, since 6 is uniformly 
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distributed over [0, 2 tt], t is uniformly distributed over [0, 1]. Finally, 
since for any r the function 

g(r, t) = r sin 2rrt 

is periodic with period T= 1, the preceding discussion implies that the 
random process z(/) is stationary. In particular, 

Pxito =P*<o>; for any tj. 

It is readily verified that 

1 2 

P*<(,>(«) = Pxi o>(a) = "7™ 

Example of a nonstationary process. The requirements that a random 
process must meet in order to be stationary are stringent. A simple 
example of a nonstationary random process x(t) is the ensemble defined 
by 

x{m, t ) = sin 277 /( 0 /)/; all w in O, (3.24a) 

in which the frequency / is a random variable with the density function 


P/O) = 



0 < a < W, 
otherwise. 


(3.24b) 


Three particular members of this ensemble, for which /= Wj 4, Wj 2, and 
r W, are plotted in Fig. 3.14. 

To show that x(t) is nonstationary, we need only observe that every 
I waveform in the ensemble is 


zero at / = 0, 


positive for 0 < * < , 

F 2 W 


negative for — < t < 0. 

2 W 

Thus the density function of the random variable a:(/ x ) obtained by 
sampling x(t) at = 1/4JF is identically zero for negative arguments, 
whereas the density function of the random variable x(t 2 ) obtained by 
sampling x(t) at / 2 = —1/4 IF is nonzero only for negative arguments. 
Clearly, p xiti) j* p x{tnJ , and Eq. 3.11 is invalid. 
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sin 2ir Wt 



Figure 3.14 Sample functions of a nonstationary random process. 

3.2 FILTERED IMPULSE NOISE 

The random processes that we have considered so far are helpful .in 
consolidating concepts but are not pertinent examples of the noise dis- 
turbances encountered in electrical communication. Filtered impulse 
noise, however, is ubiquitous; no electrical circuit is ever without it. 
We now consider one source of filtered impulse noise to provide physical 
motivation for a subsequent study of Gaussian processes. 

Figure 3.15 is a simplified diagram of a triode amplifier. Electrons are 
emitted thermionically from the heated cathode and migrate to the plate 
under the influence of the electric field induced by the plate and grid 
voltages. These electrons then flow through the plate circuit filter h(t) 
and return to the cathode. 

The typical transit time of an electron from cathode to plate is roughly 
IQ- 9 sec. From Fourier analysis a plate circuit filter with bandwidth W 


+ ° 

e s (t) 


r 


EL 


Plate 

circuit 

filter, 

h(t) 


-o + 


e(t) 


Figure 3.15 A simplified diagram of a triode amplifier stage. The voltage response 
e(t ) to a unit impulse of current is h{t). 


has an impulse response with substantial duration, say A, of at least 1 jW 
sec. It follows that even for a bandwidth of 100 Me, A is greater than 10~ 8 
sec and exceeds the transit time by an order of magnitude. 


Under these circumstances it is appropriate to conclude that each 
electron striking the plate delivers a current impulse of magnitude q to 

the filter A(0, where , . 

q = 1.6 x 10~ 19 coulomb 


is the charge of an electron. Since the plate circuit is linear, the output 
voltage e(t) is the superposition of the response to each electron individually. 
As shown in Fig. 3.16a, b, c, we have 


and 


i(t) = I.qS(t-r t ) (3.25a) 

all* 

e(t) = I i(a) h(t — a) do (3.25b) 

J— CO 


-1, qh(t - r t ), (3.25c) 

am 

where r i is the time of occurrence of the /th current impulse. 

It is clear from Fig. 3.16 that the output voltage e(t) depends on the 
precise time structure of the electron arrivals. For instance, if the impulses 
comprising i(t) occurred periodically, e(t) would be completely pre- 
dictable from knowledge of the period and phase. On the other hand, 
because of the thermionic origin of the electrons, we cannot predict 
precisely the instants (tJ — hence the output e(t) — for any given vacuum 
tube. We say that e(t) is “noisy.” 

In a mathematical model of our amplifier we treat the {-rj as random 
variables and e(t ) as a random process. Equation 3.25c provides another 
example of defining a random process by assigning certain parameters of 
a time function as random variables. 


Statistical Characterization 

The remaining problem is to determine the density functions associated 
with the process e(t). For an exact analysis we would first specify the, joint 
density function of the random variables (tJ, which is a formidable task. 




Figure 3.16 The superposition of current impulses in the plate circuit filter. The value of e{t) at any time, 
say t x or t z , is obtained by adding together the plate response (shown in c) to each individual electron. 
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In space-charge-limited operation, for instance, the {rj are not statistically 
independent; a period of greater-than-average emission from the cathode 
increases the space-charge density, which inhibits the electron flow 
immediately thereafter. 

Fortunately, the fact that the number of electrons contributing to the 
output voltage at any particular instant is enormous permits an approxi- 
mate analysis of great usefulness. If the current is 1 ma, an average of 
N = 6.25 x 10 15 electrons strike the plate per second. For W = 100 Me, 
the effective number of random variables, h(t — x f ), contributing to e(t) 
in Eq. 3.25 is N x A 6.25 x 10 7 . 

This situation is a classic example of one to which central limit theorem 
arguments may be applied. In essence , the statistical dependencies that 
exist among the {t J are insufficiently pronounced to suppress an over- 
whelming G aussian ten dency . That this is so ha^Be^sKowlrTcareful 
and detailed analyses' 79 ' 825 based on reasonable models for the electron 
stream. The results can be summarized as follows: 

1 . The random variable e(t 3 ) obtained by sampling the random process 
e(t) at the output of an amplifier such as that in Fig. 3.15 at any time t i 
has a density function that is approximately Gaussian whenever the 
electron flow is large. 

2. When the amplifier input signal e£t) is weak and affects the electron 
stream only incrementally (as in the input stages of a communication 
receiver), the output random process may be written 

e(t) = e*(t) + n(t), (3.26) 

where e*(t ) is the (nonrandom) output voltage predicted by noiseless 
circuit theory and n{t) is a random process that is independent of e*(t). 

3. In measurements made under stable operating conditions the noise 
process n(t) may be considered stationary with zero mean. Thus the 
random variables e(t^) and n(t 3 ) have mean values 


«W=o. 


(3.27) 


These analytical results are in accord with our intuition. Even more 
to the point, these results are consistent with empirical evidence : assuming 
these properties in a mathematical model of vacuum-tube noise leads to 
calculations of system performance that agree with experimental results 
under normal operating conditions. 

Noise attributable to the random arrival of discrete charge increments 
is also called shot noise. 
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Statistical Dependency 

The preceding discussion, makes it reasonable to ascribe a Gaussian 
probability density function to any time-sample of a filtered impulse noise 
process such as n(t) in Eq. 3.26: 

W«> = 7=- ( 3 - 28 > 

V 2-77 a 

where a 2 = /z 2 (q). The fact that the right-hand side of Eq. 3.28 is inde- 
pendent of tj_ reflects the assumption that n(t) is stationary. It follows 
that for any other observation time t 2 

Pn(t 2 > = Pn{t\)‘ (3.29) 

The problem of determining expressions for joint density functions such 
as p n{tl)>n ( h )> however, must still be considered. 

If* two random variables are statistically independent, their joint 
density function is simply the product of their individual densities. On 
the other hand, an assumption that rc(?j) and n(t 2 ) are statistically 
independent is. inconsistent with the shot-noise model for many choices 
of t x and t 2 . 

Let us assume that t x and t 2 are separated by an interval shorter than 
the substantial duration A of h(t). Since, from Eq. 3.25b, e(t) results from 
sliding h(t - a) past /(a) and integrating, many of the impulses that 
contribute to e(/j) also contribute to e(t 2 ), and these two voltages are 
physically dependent. We note in Fig. 3.16 that if /(a) is such that h{t x — a) 
spans a larger than average number of impulses, both e(/j) and e(t 2 ) will 
tend to be larger than average whenever |? x — t 2 1 < A. 

This physical dependence between e(t x ) and e(t 2 ) must be reflected in a 
valid mathematical model. In particular, when t x and t 2 are close together, 
knowledge that n{t x ) is larger than average must increase the conditional 
probability that n(/ 2 ) is larger than average. This implies that for a valid 
model 

p ni ,n 2 ^PnJ>n 2 > f ° r 1*2 ~ h\ < (3.30a) 

in which for notational simplicity we define 

ni A n(t x ), n 2 = n(t 2 ). (3.30b) 

The problem of ascribing an appropriate functional form to the joint 
density function p nu n z cannot be resolved without additional analysis. 

In Section 3.3 we* consider central, limit theorem arguments which 
support the fact that the appropriate choice for/> Mji „ s is the joint Gaussian 
density function encountered in Chapter 2. For the moment, however, 
it is instructive to study this density function in more detail and to verify 
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the fact that its properties are consistent with the filtered impulse-noise 
model. The joint Gaussian density of Eq. 2.58, generalized to allow an 
arbitrary variance cr 2 , is 

P) = 0 8 ) . exp 

2-7ro\j i _ 


a 2 — 2 pa/? -f- ft 
2a\l ~ R 2 ) 


-1 < p < 1. 


(3.31) 

We first observe (as in Eq. 2.64) that the individual densities p n and 
Pn 2 are zero-mean Gaussian: it can be verified readily by integration that 


Pnft*) = Pn 2 (°) = -i- e **'***■ (3.32) 

^ /2t3- a 

This is consistent with Eqs. 3.28 and 3.29. 

Second, we observe (as in Eq. 2.85) that the conditional density of n 2 » 
given that n x has value a, is 


P« 2 (/5 »!-«) = 


A Pm.nii 1 *’ P) 


— p' 2 ) 


exp — 


~ pep 


2a 2 (l — p 2 )J ’ 


I P\ < I- 


(3.33) 

The conditional density function is also Gaussian, but the conditional 
mean of n 2 , given n x = a, is pa rather than zero. The conditional variance 
is <r 2 (l — p 2 ). The terms “conditional mean” and “conditional variance” 
refer to the mean and variance of the conditional density function. 

The parameter p plays a central role in determining the structure of 
Pm.m’ For example, we recall from Fig. 2.32 that, as p ->■ 1 »p n Jifi \ n x = a) 
approaches an impulse function centered on a, the conditional variance 
becomes smaller and smaller, and there is less and less uncertainty about 
the value of n 2 , once n x is known. In the measurement of filtered impulse 
noise, values of p close to 1 pertain to observation times t x and t 2 that are 
close together; the value p = 1 pertains to the degenerate case in which 
t 2 equals t t and the two measured filter outputs are one and the same. 

In contrast, when p — 0, the random variables n x and n 2 are statistically 
independent, and 

k = «)=?,#)■ P- 34 > 


Knowledge of the value of n x tells us nothing about n 2 . The corresponding 
situation occurs in measuring filtered impulse noise when \ t 2 — r x | is much 
greater than the effective duration of the; filter’s impulse response, since 
then there is no significant overlap of the impulse patterns determining 
the value of the measurements at times t x and t 2 . Values of p intermediate 


between 0 and 1 correspond to values of \t 2 — that are comparable to 
the effective duration of the filter’s impulse response. 
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( c ) p — 0.9 


Figure 3.17 Contour plots of constant probability density for the two-dimensional 
Gaussian density function of Eq. 3.31. The density functions themselves are illustrated 
in Fig. 2.24 for a 2 = 1. 

Further insight into the behavior of p„ v n 2 as a function of p can be 
gained from the contour plots of constant probability density shown in 
Fig. 3.17. The contours are most easily visualized in terms of coordinates 
7i> Vi rotated 45° from a, p. If we let 

ct = yx cos^ — y 2 sin^ , (3.35a) 

p - Yi sin ~ 4- y 2 cos ^ , 


(3.35b) 
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the exponent of p Ulin% simplifies to 

a 2 _ 2p*p + /? 2 = yi «(l - p) + y 2 2 (l + p). (3.36) 

Thus for all |p| < 1 the contours of equal density are ellipses erected on 
the y u y 2 axes. When p = 0, the ellipses degenerate into circles, whereas 
when p -> + 1 or — 1 the ellipses degenerate into the y 1 and y 2 axes, 
respectively. 

Joint Gaussian Density Function 

Covariance. For any two random variables, say z t and z jt the central 
moment 

hi = E [( 2 1 ~ %)( z i - 2,)] (3.37) 

is called the covariance when i ^ j. (The central moment hi is the variance, 
of, of z f .) Since expectation is linear, we also have 

hi = z i z i ~ 2 hh + 

= z i z j — (3.38) 

The covariance coefficient , p ijt of two random variables is defined as 


For the zero-mean Gaussian variables n x and n 2 the covariance is n x n 2 . 
Recalling that the conditional mean of n 2 , given = a, is pa., we have 


n co 

!>«!.*,(«» P) d P da 

- CO 


= a PnX a ) da P PnXP I = *) d P 


= P « P Wl (a) da = pa . (3.40) 

J — CO 

The parameter p is therefore identified as the covariance coefficient of the 
equal-variance random variables n x and n 2 : 

_ n i n z A n /"j att 

P — T — P 12- (3.4X1 

a 

It is always true that the covariance coefficient of two random variables 
is restricted to the interval [—1, 1]: since the expected value of the square 
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of any random variable is non-negative, we have 

(zt — z. 


0 < E 

which implies 


± 


ov 


= £v! ±2 iiL + ^_, 


O'; O' 




1 < P« < !■ 


(3.42a) 

(3.42b) 


Whenever |p i3 -l = 1, the expectation in Eq. 3.42a equals zero. But the 
second moment of a random variable can vanish only if the random 
variable is zero for all sample points, except possibly a set of zero 
probability. Thus 



(3.43) 


where the random variables are equalin the sense of Eq. 2.68. 

Equation 3.43 is consistent with Figs. 2.32 and 3.17. When p — ± 1 , 
then n x = ±» 2 and the joint density function p n ^ is impulsive. In such 
a case we say that the joint Gaussian density function p ni is singular. 

Unequal variances. Equation 3.3 1 represents the joint density function 
of two zero-mean Gaussian random variables with equal variances. The 
unequal- variance case is obtainable from Eq. 3.31 by the elementary 
transformation 

*i = «i 

x 2 = bn 2 -, b> 0. (3.44a) 

Then a* and % individually are zero-mean Gaussian variables with 
central moments 

^ 

A 22 = < - AM, 

5^-0 A 12 = 2 21 = = «S) = b P ° 2 - (3.44b) 

The covariance coefficient remains unchanged: 


A A' 

P 12 — 

a^a. 


^- = P- 


(3.44c) 


In terms of these quantities, the zero-mean unequal-variance Gaussian 
density can be written by inspection: from Eq. 3.31 and the relation 
(cf. Eq. 2.88) 


P*„**( a > 0) 


\b\ 
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A , , o A 


we have, with the shortened notation cr, 2 = <r 2 and o 9 2 = or 2 , 

a 2 2 p 

5 CA B 1 ~ tTI IS I 

P 

1 




a/5 + 


2Tto x OvJ 1 _ p 12 2 p l 2(1 - p 12 2 ) Lcq 2 cr lC r 2 ^ P <j 2 2 J | 

(3.45) 

Nonzero means. The general two-dimensional Gaussian density func- 
tion is obtained from Eq. 3.45 by making the further transformation 


2(1 - p 2 )U 2 o(bo) ' ' bW 

1 




z i = x i + z i, 
z 2 = X 2 + Z 2, 


(3.46) 


which implies 

P) = P* lf **(« = ! — 

x ex P {-— J — 2£a (a- 2 7x / 5-;r i )+ ( £iJ^]} 

l 2(1 — pi 2 )L «Ti a x a z <r 2 2 J' 

(3.47) 

In writing Eq. 3.47, we have recognized tha t transformations , involvin g 
only the addition of constants, as in Eq. 3.46, do not alter centraTm oments. 
so that 


4 EK* - zjf] = xj_, 
a 2 = E[(z 2 - F 2 ) 2 ] = x} t 

Pn 4 £ [(z x - z~i)( 2 a - g~ a )] = ^1% 
Cicr 2 °’i°^ 2 


(3.48) 


Random variables z x and z 2 are called jointly Gaussian if and only if their 
density function p ZliZi has the form specified by Eqs. 3.47 and 3.48. We 
observe that the general joint Gaussian density function depends only on 
the means, variances, and covariance (or covariance coefficient). We also 
observe once again that two jointly Gaussian random variables are 
statistically independent if and only if their covariance is zero. 

General linear transformations. One of the most important properties 
of the joint Gaussian density function is that random variables defined as 
linear transformations of Gaussian variables are also Gaussian. In this 
section we show that this is true for any reversible linear transformation 
applied to the pair of zero-mean Gaussian variables x x and x % of Eq. 3.45. 
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The generalization to k Gaussian variables, k > 2, is deferred until the 
next section. 

First consider the transformation (x lf x 2 ) -> (a l5 x s ) given by 

a,' 3 4 aXj + bx 2 ; b ^ 0. (3.49) 

The condition b ^ 0 guarantees that the transformation is reversible. If 
b = 0, both x t and depend only on the single random variable and 
is singular (that is, it involves impulses). From Eq. 2.76 the 
c on^TtTonal^Hensi t yof a.- 3 , given x 1 = oc, is 


p4P I = = 

Multiplying by p Xi { a) yields 


1 / p ~ av. 

■Pa!l, !*)(“’ r) , , . Pxi,X2\ a ’ r 


Substituting Eq. 3.45 in Eq. 3.50 and simplifying, we have, after consid- 
erable (and unrewarding) algebra, \ 


Pxi,x s( a > P) 


IttGxGz-J 1 - p 13 2 


x exp — 


2(1 - Pi3 2 )Lffi 2 




a 3 2 4 a 3 2 = a 2 % 2 + 2a6a; 1 a; 2 + b x 2 , 
a ^*3 ax? -i- bx t x 2 

P 13 — — : — 

o-i<r 3 o-jo-a 


(3.52a) 

(3.52b) 


We note that Eq. 3.51 once more has the Gaussian form. 

Next consider the reversible transformation from the pair (aq, x 2 ) to 
the pair (%, x 4 ) given by 


By writing 


x 3 = axi + bx z ; b 9 ^ 0, 

x 4 = cx 1 + dx 2 \ be — ad 76 0. 


/ da\ d 

Xi= \ c ~Tr l + b 3 ' 


we observe that the transformation of Eq. 3.53 can be considered as the 
cascade (x 1} x 2 ) -> (x ly x 3 ) -> (x 3 , x 4 ). Since each individual step results 
in the general Gaussian form, the cascaded transformation does also, and 
the proof is complete. 


cr;V <r + irp /w,^ » <rp & ^ ^-^7 


i + i 


JOINT GAUSSIAN DENSITY FUNCTION 


A simple example of a. linear transformation is (n 1} « 2 ) -> (x x , xj), in 
which 

a 77- .7 r 

*1 = cos — h « 2 sin - , 

4 . 4 


A .11 IT 

x 2 — —n 1 sin — h «2 cos - 
4 4 

and p„ i( „ 2 is the zero-mean equal-variance joint Gaussian density function 
given by Eq. 3.31. Since 

°i 2 = 4- 2 (£)« 1 /i 2 + -|« 2 2 — c 2 (l 4- p), 

cr 2 2 4 z 2 2 = \n-f — 2 (£>i 1 « 2 + \n£ = u 2 (l — p ), (3.55) 

2 12 4 a.ya: 2 = — i/?! 2 + — \n x n 2 + \n£ = 0, ^ 

from Eq. 3.45 we have 


: exp — 


ya) = i^vf=7 exp L" MTT7) - iToTrtJ 

= p* t iv l) p* s (7 2 )- (3.56) 

We observe that the random variables x 4 and x 2 , obtained from the 
statistically dependent variables n L and n 2 by the “rotation of coordinates” 
transformation of Eq. 3.54, are statistically independent. By choosing the 
angle of rotation to be | tan- 1 [2p 12 cr 1 <r 2 /(cf 1 2 - <r 2 2 )] rather than w/4, the 
general two-dimensional Gaussian density of Eq. 3.47 can also be trans- 
formed into statistically independent form. 

Summary. The preceding discussion has established four extremely 
important properties of two random variables that are jointly Gaussian , 
that is, variables with the joint density function given by Eq. 3.47. 

_ 1. The joint Gaussian density function p gliH depends only on the means 
z i and z 2 , the variances a z j and a z j, and the covariance 

2 i 2 = £[( 2 ! zj)(z 2 — zjj)]. 

2. If z x and z 2 are jointly Gaussian, they are individually Gaussian. 

3. Two variables that are jointly Gaussian are statistically independent^ 
if and only if their covariance is zero. 

4. Linear transformations on variables that are jointly Gaussian yield 
new variables that are also jointly Gaussian. 

These four properties are not true in general for two random variables 
that are not jointly Gaussian. 
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In Section 3.3 we derive from multidimensional central limit theorem 
arguments a density function for k random variables that is called k- 
dimensional Gaussian. Random vectors with this density function are 
jointly Gaussian. We shall see that the four properties summarized for 
the case k = 2 extend without change to the case of arbitrary k: 


1. The density function p z of a jointly Gaussian random vector 
z = z 2 , . . . , z fc ) depends only on the means (zj and the set of central 

moments {A*,-}: 

la '= 4 Ef(z £ - i j)(z, - zj)3; i = 1, 2 , . . . , k, 

j= 1, 2, . . . , fc. 


2. Any subset, of jointly Gaussian random variables is jointly Gaussian. 

3. A set of k Gaussian random variables is statistically independent if 
and only if the covariances = 0 for all i and j i. In this case. 


Pz(a) = IT Pzi( a J = 






in which we have written o^ 2 in lieu of and assumed all z,- — 0. 

4. Any linear transformation of a set of k jointly Gaussian random 
variables yields new variables that are also jointly Gaussian. In particular, 
a weighted sum of Gaussian variables is Gaussian. 


Even for k = 2 we have seen that the algebra is tedious and that the 
general expression for p z is sufficiently cumbrous that notation is a problem. 
For k > 2 the simplification that results from the use of matrix notation 
is essential. This notation is reviewed in Appendix 3A and is applied in 
Section 3.3 to verify the foregoing properties of joint Gaussian variables. 
Since only the properties themselves, and not their proofs, are used in the 
sequel, Section 3.3 may be omitted on a first reading. 


yf 3.3 THE MULTIVARIATE CENTRAL LIMIT THEOREM 

Insight into the appropriate mathematical model for filtered impulse 
noise is gained from consideration of the multivariate central limit theorem, 
which reduces for a single random variable to the central limit theorem 
of Chapter 2. The theorem is proved by means of the joint character- 
istic function, denoted Af x (v), of a set of k random variables 



X • — #2, • • • j **:)• 


(3.58) 
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Joint Characteristic Functions 


We define M x (y) to be that function obtained from the joint density 
function p x by performing a Fourier transformation on each argument: 


M x (v lt v 2 , . . .„, v k ) 


p x (cc l5 a 2 , . . . , a ft )e iVl “‘e |V2a:2 • • • e iv * a * d^ da 2 ■ ■ ■ d<x k . 


Using matrix notation and the theorem of expectation, we can write this 
more concisely as 


M x (v)4 “ Px (a)e^ T da 
J — O) 


= = E^exp = E [TT • (3.59) 

If the are statistically independent, 

M x (v) = n E|> iv --*‘] = IT A (3-60) 

i-l i-l 

Note that MJy) is a function of k arguments : v = (v lt v 2 , ... , v fc ). 
Equation 3.60 should be contrasted with Eq. 2.98, the characteristic 
function of a sum of independent random variables, which is a function of 
only one variable v. 

Just as in the one-dimensional case, the joint density function p x can 
be regained from M x by the inverse Fourier transform; in matrix notation 
this is written 

Px(«) = I M x (v)e"i''“ r dv. (3.61) 

(Z7TJ J-co 

The only essential difference between single and multidimensional Fourier 
transformations is the amount of labor required to evaluate the integrals. 

Moments. In addition to their role in establishing limit theorems, joint 
characteristic functions are useful in calculating moments. This property 
has already been exploited in connection with one-dimensional char- 
acteristic functions. The general ^-dimensional case is a straightforward 
extension. We first note from, the definition of Eq. 3.59 that if we define 
the complex random variable w as 


w = jvx T = + v 2 x 2 -1 + v k x k ), 


(3.62a) 


M x (v) = e \ 


(3.62b) 
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If the moments exist,! we can expand e w in a power series to obtain 

— r w I 2 w 3 

e-«E[l + W + -+- + *"J ■ 

= l+w + ^ + ^ + --*. (3.62c) 


Let us examine the second term in the expansion. From Eq. 3.62a 


W = Kvi^i + V Z X 2 + ■ ■ ■ + Vk x k)> 


(3.62d) 


which involves only the means of the {*,•}. In the derivation of the multi- 
variate central limit, we shall be concerned with zero-mean random 
variables. Therefore, we now particularize to the case in which the means 
{aj are all zero. Letting 0 denote the vector each component of which is 
zero, we have 

E[xl = (*i, * a , • • • , #*) = (0, 0, . . . , 0) = 0. 

The second term in Eq. 3.62c is then w = 0. 

Let us next examine the third term, w 2 /2. Since 

/ fc \2 k k 

w 2 = j 8 I ViXA = -I 

\i-l 1 *= 1#-1 • 


taking the expectation yields 


k k 

- 2 

i=l 3-1 


For zero-mean random variables x t and x i the covariance X u is 
X u = x^i\ % = *5 = 0. 

Thus 

w 2 = — 2 2 v i*a v i’ (3.62e) 

i=l 3=1 ' 

and x. & may be evaluated by determining the coefficient multiplying v i v i in 
the power series expansion of MJy). 

Similarly, examination of w 3 and higher order terms shows that the 
coefficient of any term such as (vp, • • • v t ) in the power-series expansion of 

I r co I p 03 _ 

t Since |M x (v)| = e>™ T p x (a) da < \e>™ |/> x («) da = 1, the characteristic 

IJ-CO I J-«> _ 

function is finite for all v. If only the first J moments of yv exist, it follows that e w may 

be expanded in terms of w>, 1 < j < J, plus a remainder term that vanishes as v -> 0. 
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MJy) is proportional to the corresponding mixed moment (zjc f • • ■ x t ). 
Thus all moments that exist may be evaluated by expanding M x (v) in a 
power series. 

/ he covariance matrix. The form of the expression for w 2 in Eq. 3.62e 
can be simplified by using matrix notation. Recognizing that the double 
summation is a quadratic form (cf. Eq. 3A.19), we can write 

w 2 = — vA s v T , (3.63) 

for which we define the matrix 

2n 2 12 • * • 2 lfc 

Ki 2 22 • * • A 2fe 


A x is called the covariance matrix of x. Since X i} = X H for all i and j, a 
covariance matrix is symmetric about its principal diagonal. 

The covariance matrix plays a central role in the multidimensional 
central limit theorem. Observing thatf 



we can write A x in the compact form 

A* = E[x T x] = x T x, (3.66) 


where the expectation E[A] of a matrix A with elements {a i} } is defined as 
the matrix whose elements are With this notation and the fact that 

k 

vx T = 2 v i x i — xvT > ( 3 . 67 a) 

i - 1 
k 

t Since xx T = x • x — 2 x i*> E 9- 3.65 is a good example that matrix multiplication is 
not commutative. « = 1 
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we observe that Eq. 3.63 may also be derived directly by the sequence of 
equalities 

v7 A E[(jvx T ) 2 ] = — E[(vx T )(xv T )] 

= — vx T xv T = — vA x v T . (3.67b) 

Central Limit Argument 

The development of the multivariate central limit theorem which 
follows exactly mirrors the development of the corresponding one- 
dimensional theorem in Chapter 2. The only distinction is that the use 
of matrix notation now permits us to treat a sum of random vectors 
rather than just a sum of random variables. 

Let us consider the vector 2 = ( z i> z 2 , defined as 

' ( 3 - 68 ) 

VN*= 1 

We assume that each x< is a ^-component vector that is statistically 
independent of all others: 

fcu* * - n JV (3.69a) 

Also, we assume that each x i has the same density function, say p x , with 
zero mean, covariance matrix A x , and characteristic function M x : 
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Here we have used the fact that x* and x f are statistically independent to 
evaluate 

E[x 4 T x 5 ] = E[x/] E[x,] = (0); j ^ i, 

in which (0) denotes the matrix each element of which is zero. 

We now take up the limiting form of the characteristic function of z: 


M z (v) 4 E[e ivzT ] 

= E exp ( jv -L 2 X; T ) 
L \ J N f=i ' 


= E LH exp l i vf x vJ' 

Since the random vectors {xj are statistically independent, so are the 
random variables (exp (j v xJ/V AT)}, and the mean of their product is 
therefore the product of their means: 


M,(v) = II E exp j — x? 


A' 

= II 


= M. 


Taking logarithms on both sides, we have 


lnM,(v) = NlnM x (-L). 


The limiting behavior of the right-hand side of Eq. 3.71_can be deter- 
mined by expanding first MJyjsj N), and then In M x (v/V N), in a power 
series. We assume for simplicity that all moments w\ j = 1^2, . . . , are 
finite. The proof may be extended to the case in which only w 2 is finite by 
expanding in a power series with remainder. If w 2 is not finite, the central 
limit theorem is not valid. From Eqs. 3.62 and 3.70 

w 2 w 3 

M x (v) =1 + * + -+ - + ••■, 

where 

N 

w = jvx T = j 2 v i x i-> 

i= 1 

w = 0 , 


w 2 = — vA k v t = — vA z v T . 
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M x (-p) = 1 - — vA 2 v t + / v (-p)> 

\y/N/ 2 N N A Vjv/ 


(3.72a) 


l-3! + 4!ViV + 


(3.72b) 


is a continuous function of a/// that for any fixed v approaches the con- 
stant w a /6 as N becomes large. By taking N large enough, we can make 

/ 1 \ 

2 N N* f '\jNl 

as small as we wish. It follows that for sufficiently large N we may invoke 
the expansion 


In (1 + w) = u — ^ 4- j ; |n| < 1 


and write 


In mJ^=) = - — vA y -{- + (other terms). 

V W 2 N N A 

Since the “other terms” involve powers of N more negative than we 
have, for any fixed v, 

lim In M z (v) = lim N In M x (- 7 =) 
jV->ro N->c 0 \y/N/ 

- N [~ w vAy + w y ft) + (otherterras) ] ' 

= -ivAy. 

From the continuity of the exponential function 


lim MJy) — exp (-■ |vA z v T ). 


(3.73a) 


Equation 3.73a is our desired result. For any p x the characteristic 

function M z (y) of a normalized sum z = (1 l'jN) ^ / x i of identically dis- 

i- 1 

tributed zero-mean random vectors (xj approaches exp [— &vA z v T ]. This 
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limiting function involves only the covariance matrix A 2 = A*. Note that* 
when k = 1 Eq. 3.73a reduces to 

lim M z (v) = exp (— %vA n v) = exp (—^vV 2 ) (3.73b) 

iV-»«o 

and is in accord with the one-dimensional central limit result of Eq. 2.178. 
Gaussian Random Variables 

In Chapter 2 we called a single random variable “Gaussian” if its 
characteristic function had the form of the right-hand side of Eq. 3.73b. 
We now generalize the definition to k variables by saying that any zero- 
mean random vector y is Gaussian if and only if 

M y (v) = ex p(->A,v T ). (3.74) 

Alternatively, we say that the components (yj of y are “jointly Gaussian.” 
Equation 3.73a states that M z (v) approaches the Gaussian form as N co. 

The density function of a zero-mean Gaussian vector y is determined by 
taking the inverse Fourier transform of M y . However, just as in the one- 
dimensional case, we must refrain from claiming that the density function p z 
of a normalized sum z necessarily converges to Gaussian form as //gets large. 
Whenever p x does not contain impulses, the convergence occurs. However, 
if p x contains impulses, so does p z . As in Chapter 2, it is only the dis- 
tribution function F z that always becomes Gaussian (provided A x exists). 

The definition of “jointly Gaussian” is extended to vectors with nonzero 
means in an obvious way. If x is a fc-dimensional vector with mean 

E[x] = m* = 0% xf>, (3.75a) 

then x is called Gaussian if and only if the zero-mean vector 

y = x — m K (3.75b) 

is Gaussian ; that is, if and only if 

M y (v) = exp (— JvA„v T ). (3.75c) 

We therefore have 

M x (v) 4 e iv < y W> 

= 7" T e ivm * T = M y (v)e ivm * T 
= exp (— £vA„v T 4- jvmj). 

Since, in accordance with the definition of Eq. 3.37, the covariance of 
x i and Xj is 

hi = E[(a^ - Xj)(Xj - x 3 )] = E 
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the covariance matrices of x and y are the same: 


K = A y . 

(We again note that central moments are invariant to transformations, 
involving only the addition of constants.) We conclude that the general 
form of the Gaussian characteristic function is 

MJy) = exp (— ivA s v T + jvmj), (3.76a) 

where 

m x = (*i> ^ • • • > (3.76b) 

A x 4 E[(x - m ffi ) T (x - m x )]. (3.76c) 

Filtered Impulse Noise Process 

The appropriateness of assuming that samples n(t) — (»(ti), n(t 2 ), . . . , 
n(t k f) observed from a filtered impulse noise source at any set of times 
{t } } are modeled mathematically by a joint Gaussian density function 
hinges on the multivariate central limit theorem. Let h, be the random 
vector obtained by sampling the plate response to the zth current impulse 
in Fig. 3.16 at times t v t 2 , . . . , t k : 

h t - = ( h(t ! — r £ ), h(t 2 - t,.), . . . , h(t k — Ti)). (3.77a) 

As in Eq. 3.25, h{t) is the (known) filter impulse response, and the random 
variable r £ is the arrival time of the zth impulse. Whenever the impulse 
arrival times (tJ are substantially independent and the average number 
of impulses arriving during the effective duration of the filter’s impulse 
response is extremely large, the central limit theorem implies that the 
density function of 

n(t :)-2h, • (3.77b) 

i 

is closely approximated by the joint Gaussian density function. 

Properties of Gaussian Random Variables 

Before considering the form of the multivariate Gaussian density 
function, it is instructive to observe certain properties implied by the 
definition of the Gaussian characteristic function. We now show that any 
set of k jointly Gaussian random variables, say x = (x u x 2 , . . . , x k ), 
exhibits the four properties claimed on p. 1 56. 

Property 1. The joint density function of the {a;,.}, p x , depends only on 
the means ra* and, the central moments 2 £i , z = 1 , 2, . . . , k ;j = 1,2 , ,k. 
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Proof follows from .the fact that the Gaussian characteristic function 
M x depends only on and the covariance matrix A x , with elements 
Thus p x , the Fourier transform of M x , also depends only on these 
quantities. 

Property 2. Any subset of the {#*}, say x 0 = (x lf x 2 , ... , xjj, where 
1 < / < k, is also jointly Gaussian. 

Proof follows from noting that, if v 0 4 v 2 , . . . , vj). 


M Xa (y 0 ) = E[e 


i(viKi+v 2 a;2+ • • • +via:j)i 


_ JJ[gi(viSBi+vs»rf hvjaij+O • 0:1+1+ ’ ••+<>■ 2*-) 




From Eq. 3.76 


= M x (y lt v 2 , . . . , v u 0, 0, . . . , 0). 


(3.78a) 


M x (v) = exp ^ 2 + 1 2 *>**<)■ ( 3 - 78b ) 

\ i - 1 / 

Substituting Eq. 3.78b in Eq. 3.78a and discarding terms that are equal to 
zero, we again have the Gaussian form 

M Xo (v 0 ) = exp (- - 2 + j 2 (3.78c) 

\ 2i=l J=1 i=l / 

As a special case, we note that each component x i of a Gaussian vector x 
is individually Gaussian, with variance of = X H and mean x t : 

M X( (v,) = M x ( 0, 0, . . . , v t , . . . , 0) = exp + \vpf. (3.78d) 

Property 3. The are statistically independent if their covariance 
matrix is diagonal-, that is, if all covariances {A w } are zero for j i : 


K = < 

in which we use the Kronecker delta, defined by 

a H; for j=i t 
” \0; for j ^ i. 


(3.79a) 


(3.79b) 


Proof follows from substituting the diagonal covariance matrix 


0 


(3.80a) 
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in the expression for the characteristic function : 


MJy) = exp (— ivA^v 7 + jvm/) 

= exp (- ~ X v t W + J' 2 

\ 2i=l i = 1 / 


= n ex P (-toV + m) = n (3.80b) 

2=1 1=1 

Taking the inverse Fourier transform in accordance with Eq. 3.61, we 
obtain 

pj a) = — — - f M x (v)e -|V “ rfv 
FxK ' ( 2 vf J-«, 

fc 1 f 03 

“ IT — M Xi (v i )e~ iviat dv, 

i —X 2 -n J- CO 


= IT 

<=i 


(3.80c) 


which completes the proof. Equation 3.80c states that any set of (not 
necessarily Gaussian) random variables is statistically independent when- 
ever their joint characteristic function factors. For Gaussian variables the 
factorability of the characteristic function is guaranteed by the condition 
that the covariance matrix be diagonal. 

Property 4. Let y = (y l5 y z , ... , yj) be a set of random variables 
obtained from x by means of the transformation 

y T = Ax T + a T , (3.81) 

where A is any k x k matrix. Then y is also jointly Gaussian. 

Proof follows from showing that the joint characteristic function of y 
has the Gaussian form of Eq. 3.76. By definition, 

. . A ivy T jv(Ax T +B T > 

MJy) = e = e 

__ gj(vA)x T gjva T 

= M x (vA)e ivflT . 

Since MJy) is Gaussian, 

M y (y) = {exp [— £(vA)A 8 (vA) T + j(vA)m/]} e iuaT 

= exp [•~Jv(AA a .A T )v T + jy(AmJ + a T )]. (3.82) 
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are statistically independent. (Note that we had previously established in 
Eq. 2.142 the restricted result that a sum of independent Gaussian variables 
is Gaussian.) 

In matrix algebra 43 it is shown that for any nonsingularf covariance^ 
matrix A v a reversible transformation By T = x T exists such that 

A x = BAj,B t (3.85) 

is diagonal. Thus an arbitrary nonsingular set of Gaussian random 
variables {?/,•} can always be transformed into a set of statistically inde- 
pendent Gaussian random variables {#*}. Conversely, the transformation 
A = B -1 applied to the {*,} regains the statistically dependent vari- 
ables {yj. A diagonalizing transformation B also exists when A y is 
singular, but in this case B is irreversible. 

The Multivariate Gaussian Density Function 

The form of the joint Gaussian density function is easily obtained by 
first considering a set, say x = (aq, x s , . . . , a;*), of statistically independent 
zero-mean Gaussian random variables : 

pM = IT p„M = n 1 = c = ?• (3- 86a ) 

i=i i - 1 V Zno i 

In matrix notation 

P.W - ,, .g ex P (-i“ A - _1 “ T ). (3.86b) 

where A, x is the diagonal matrix 


(3.86c) 


f As in Appendix 2A, we say that A„, or y, is singular whenever the determinant 
(A„| = 0, which implies that the inverse matrix A„ _l does not exist. In this case some 
of the component random variables comprising y are equal to linear combinations of 
the others, so that p y contains impulses. A singular random- vector y results when a 
nonsingular random vector z is transformed by an irreversible matrix C ; that is, when 
yT = CzT and |Cj = 0, so that z cannot be regained from knowledge of y. 
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with inverse 




(3.86d) 


and determinant 


IAJ = ■ • • cr fc 2 . 


(3.86e) 


Next, let us consider the (Gaussian) random vector obtained from x 
by the reversible matrix transformation 


y = xA T + m„ = f(x). 


(3.87a) 


Here, the notation f( ) is that of Appendix 2A. The inverse trans- 
formation is defined as 

x = g(y) = (y - m^B 1 , (3.87b) 

where B is the matrix inverse to A: 

B = A" 1 . (3.87c) 

It follows from Eq. 2A.7 that 

ft(P)-ft(g(P))W)l. (3.88) 

where |/„(P)| is the absolute value of the Jacobian of the transform- 
ation g ( )• 

By definition, the (i,j ) th element of J 0 (fi) is 


(3.89a) 


From Eq. 3.87b 

g((3) = ((3- m y )B T 

Denoting the (i,j) element of B by b u , we have 


- Vi) 

j=i 


(3.89b) 
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MW=6 u . (3.89 c) 

°Pj 

We conclude that the Jacobean is just the determinant of the matrix B, 

J.m = |B|, (3.89d) 

and is independent of (3. 

Substitution of Eqs. 3.86b and 3.89d in Eq. 3. 88. then yields 

pm = t-W - m ») BTA ^ lB (P T - 

{Ztt) XIAJ/ 

The last step in determining the general form of the multivariate 
Gaussian density function is identification of terms. Since B = A -1 and 

A y = E[(y T - m/)(y - m tf )] = E[Ax T xA T ] = AA X A T , 

invoking Eqs. 3A.28 and 3A.30 we have 

B t VB = (A“ 1 ) t A 1 > - 1 A " 1 = (AA^A 7 ) -1 = V 1 - 

Using the well-known results for determinants 43 that 

|A T | = |A| 

and that for any two ( k x k) matrices 

|CD| = |C| |D|, 


which implies 


III = I A -1 A| = IA- 1 ! |A| = 1 


we have 

|B| 2 _ 1 1 .. 1 

IAJ |A|1A.||A T | |AA,A t ! |Aj,| 

Accordingly, p y may be written concisely as 

Py(P) = (27r) J| A ^ ex P [-KP - “OV'CP " “*y) T 3- ( 3 ' 90 > 

Equation 3.90 is the general form of the nonsingular Gaussian density 
function. We observe that it depends only on the mean vector and the 
covariance matrix A„. That Eq. 3.90 represents the most general form of 
the nonsingular Gaussian density function follows from the fact that any 
such set of Gaussian variables may be obtained by matrix transformation 


THE GAUSSIAN PROCESS 171 


from a statistically independent set. As a final point, we note that, since 
p y and M y are Fourier transforms of each other, the righthand side of 
Eq. 3.90 is the inverse transform of 

M y (v) = exp (— ivAj,v T + jvm/ ). (3.91) 

As an example of Eq. 3.90, consider the two-dimensional case with 
covariance matrix 

/ a 2 pa 2 \ 


pa “ a 2 


JpI < 1. 


I Ay | - OKI ~ p*) 

1 = J_/ a 2 -po*\ _ 1 /1 -p 

|AJ\_ P( ; 2 a 2 / ^(1 - p 2 )\_ p i 


If m y = (0, 0), we have 


A*) 


(fate 


1 -pW. 


2Tra 2 y/ 1 — p 2 L2<7 2 (1 — p 2 ) \_p 1 

1 exo T- tMMAl 
2^Vl-p 2 L 2<r 2 (l — p 2 ) J- 


2tto\j 1 — p 2 L 2<r (1 - P 2 ) J 
This density function has already been studied in detail in Section 3.2. 


3.4 THE GAUSSIAN PROCESS 

Consider a random process x(t), and let x(t) = (xfo), x(t 2 ), . . . , x(t k J) 
denote the random variables obtained by sampling x(t) at the set of k time 
instants (fj. If the variables x(t) are jointly Gaussian for every finite 
set of observation instants (tj, then x(t) is called a Gaussian process. 

The conditions that a process must meet in order to be Gaussian are 
stringent. The one-dimensional central limit theorem of Chapter 2 has 
been used, however, to argue that the output of a filtered impulse noise 
source, observed at any single time t v can be adequately modeled mathe- 
matically by a Gaussian random variable. More generally, the multi- 
variate central limit theorem justifies the assumption that k output samples 
observed at any set of times (fj can be adequately modeled by k random 
variables that are jointly Gaussian. The important condition for the 
validity of such a mathematical model is that the values of the observed 
samples depend on the sum of a large number of relatively independent 
perturbations. Since there are many circumstances, such as thermal 
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noise in resistors, diffusion noise in transistors, spontaneous emission 
noise in masers and intergalactic noise in radio astronomy, in which this 
condition is met, Gaussian processes are of the utmost practical (as well as 
mathematical) importance. ^ 

Specification of Gaussian Processes 

We have seen that an arbitrary random process is considered specified 
if and only if a rule is implied for determining the joint density function of 
samples taken at any finite set of time instants {/,}, i = 1,2,..., k. One' 
of the important properties of a set of jointly Gaussian variables, say 
x = (x u , **), is that p x depends only on the mean values 

E[x] = nij, = (*!> xl , . . . , (3.92a) 

and the set of covariances! {?.„}, i — 1, .2, . . • , k; j = 1, 2, . . . , k, in 
which _ 

Aii l = E[( x i "" X i)( X i ~~ X t>\ 

— xjCj — xfo. (3.92bj 

Letting x i denote the random variable x(t t ), i = 1, 2, . . . , k, we see that 
a Gaussian process x(t) is completely specified by knowledge of how the 
means and covariancerdepen^^ 

The mean function. To be able to specify m* for any set of instants 
(rj, it is necessary and sufficient that we know a function m x (t), called 
the mean function of x{t), defined by 

m x (f) = E[a:(f)j. (3.93) 

For example, since x t and x, denote the random variables obtained by 
sampling the process at times t t and t } , respectively, 

Xi = mjti) (3.94a) 

x'j = m x (t } ). (3.94b) 

The covariance function. Similarly, in order to be able to specify the 

set of covariances (A w ) for any set of instants {**}, it is necessary and 

sufficient that we know a function s ), called the covariance function 

of *(/), defined by 

2 At, s) = E[( x(t) - mjt) )( x(s) - m x (s ) )3- (3.95a) 

t We refer generically to the X is as “covariances,” even though terms for which j = i 
are variances. 
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Then for samples x t and x } - taken at t t and t* respectively, we have 

K = 2 At* t 3 ). (3.95b) 

In interpreting s ), we think of observing each sample function 

x((o, t ) of the process first at some particular time t and again at some 
time s, as shown in Fig. 3.18. The product of these two samples (with 
means subtracted) is [x(a>, t ) — tnj,t)][x(a>, s) — m x (s )]. The covariance 
function 2 At, s) describes how the. expected value of this product, over 
the ensemble of points co in Q, varies as a function of the sampling instants 
t arid s. 



Figure 3.18 Interpretation of the covariance function. For this particular choice of 
t and $ we see that mjj) = 1, m x {s) = 0. Let z be a random variable defined by 
z(co) = t) — ]][x(co, 5 )]. Then 

<o h ) = (2 - IK— 8/3) = -8/3 
*(«*) = (I - 1)(3) = 0 
z(w 2 ) = (3 - 3 )(4) = 8 
z(co 3 ) = (-2 - 1)(2) = -6 
and ~? x {t, s) = E[z] = —7/3. 

_ _S 1 j. 3- 
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Example . Suppose it is known that a random process x(t) is Gaussian 
and 

mft) = sin nt, (3.96a) 

Sfft, s) = e" 14 - 1 . (3.96b) 

The set of covariances of two samples x, and x 2 taken at times t ± = 1 and 
t 2 = % is 

o', 2 = A n = = 1, 

a 2 2 - A 22 = e-' <2 - <al = 1, 


_ A j _ 3 _ — l<i— <al _ g- 1 ^ 

^ j Pl2°l°2 — ^12 ^21 — e e • 

The vector of the means is 

m = {mftf, mftf) = (0, -1). 

It follows from Eq. 3.47 (or Eq. 3.90) that the joint density function is 

PauJ*’ P) = 

X exp ( — - — T a 2 ia(/5 + 1) + (0 + l) 2 ]• 

l 2(1 — e -1 )L V e Jj 

The Correlation Function 

In addition to the covariance function of a random process x(t), we 
frequently encounter the correlation function , denoted Sift, s), and 

defined as 

Sift, s ) = E[x(t) x(s)j = x(f) x(s). (3.97) 

From Eq. 3.95 we see that Sift, s ) and Sf ft, s) are related by 
&ft, s) = [x(t) - mft)][x(s) - mjsj] 

^ , = x(t) x(j) - mft ) 5*) - mfs) x(t) + mft) mfs) 

x (i) = Sift, s) - mft) mfs). (3-98) 

It follows that a Gaussian process is completely specified by knowled ge of 

mft) and eit her ft? ft, 5 ) or 3\,(t, s). 

Finally, it should be noted that ail three of these functions— say 
mft\, SPft, s), and Sift, s)— may also be defined for any process y(t) that 
is not Gaussian. In the non-Gaussian case, however, knowledge of- these 
functions alone does not imply that the process is completely specified. 
For example, consider the two random processes y(t) and z(f) shown in 


iu - 1 
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2s 

Fig. 3.19; in both cases every sample function is a constant, and it is 
clear that 

mft) = mft) = 0 
Sift, s) = Sift, s) = 1 . 

But, for any observation instant t = t x , 

PvitM = - V2) + 23(a) + 3(a + yfl)), 

whereas 

/>*<«,)(>-) = - 1) + <5(a + 1)}. 

Stationary Gaussian Processes 

To be stationary an arbitrary random process must be such that all 
joint density functions are invariant to any translation in time origin. For 
a Gaussian process x(t), the joint density function p x of k samples {x f } 
observed at times {/*}, f = 1, 2, . . . , k, depends only on the set of means 
{xf and covariances {If. Thus p x is unaffected by a translation T in time 
origin if and only if 


mftf = E [x(tf] == E[x(t, + T)] = mft, + T) = x = a constant 

(3.99a) 


and also 


) Hj = sefti, t } ) = e[x(/ 8 ) x(t 3 .)j ~x 2 

= E[x(/ f +T) x(tj + T)j - X 2 = &ft t +T,t } + T) (3.99b) 

for all ti, tj, and all T. In particular, if we choose T = — /,■ in Eq. 3.99b, 
we have 

A,, = E [x(t i - tf x(0)] — x 2 

for all ti, tj, which implies that 

J?ft,s)=&ft-s, 0) (3.99c) 

for all t and s. The covariance function must depend only on the interval 
(/ _ s') between observations and not directly on these observation 
instants themselves. A Gaussian process is stationary whenever Eqs. 3.99a 
and c are satisfied. In order to simplify notation, it is conventional to 
drop the second argument in Eq. 3.99c and write SS ft — 5 ) instead of 
jfft — s, 0). The conditions that a Gaussian process must meet nTorHer ~j 
to be stationary are then 

mft) = x = a constant 
&ft, S ) = seft-s). 


(3.100a) 

(3.100b) 


PM = * 
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Figure 3.19 Two different random processes with the same mean and covariance functions. 
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An example of a covarian ce function meeting this condition is Eq. 3.96b. 
Given that Eq. 3.100a is satisfied, a requirement equivalent to Eq. 3.100b 
(with the same notational bonvention) is 


■K(t, s) = 31 Jt - s). 


(3.100c) 


Notice that these conditions are not sufficient to guarantee stationariness 
for random processes that are not Gaussian. The following random 
process is a counterexample. Let O. contain five sample points, and let each 
point be assigned probability Let z(t) be the process whose sample 
functions z(a>, t) are 

z(eo x , t) — — yjl cos t, 
z(ft) 2 , 0 = — V 2 sin t, 

z(co a , r) — V2 (cos / + sin t), (3.101) 

z(o) 4 , /) = (cos t — sin /), 
z(cu 6 , t) = (sin / - cos t). 

It can be verified by direct calculation that 

z(0 = - 2 *(«<* 0 = 0; for all t 
5i~i 

2(0 z(s) = -i t ) 2 ( 0 ),, s) = - cos (t - s). 

5 £-1 -> 

Thus the conditions of Eq. 3.100 are met. 

On the other hand, it is easy to show that z(t) is not stationary. For 
instance, consider the two random variables z x and z 2 obtained by ob- 
serving 2(0 at times t x = 0 and t 2 = tt/ 4. We have directly from Eq. 3.101 

7 > Zl (a) = H<5 (a + V2) + 5(a) + 5(a - yfl) + 5(a - 1) + 5(a + 1)] 

p z Ji a) = i[2<5(a + 1) + 5(a — 2) + 25(a)}. 

Thus p H ^p Zl - 

Gaussian Processes through Linear Filters 

We have argued in Sections 3.2 and 3.3 that filtered impulse noise 
becomes Gaussian when the number of impulses per second becomes 
large. More precisely, we require that the average number of impulses 
occurring during the effective duration of the filter’s impulse response be 
large (as indicated in Fig. 3.16) and that the arrival times of the impulses be 
substantially independent of one another. In general, it follows from the 


178 RANDOM WAVEFORMS 






Figure 3.20 Two smoothing filters in cascade. The dashed box may be considered 
either as a single filter with impulse response h 3 (t) and effective duration A 3 or as two 
filters in cascade. {The effective duration A of any filter response hit) containing 
impulses is zero; the output of such a filter is also impulsive and obviously does not 
become Gaussian as N -* co.] 


same argument that the output of a second filter connected in cascade 
behind the first, as shown in Fig. 3.20, also becomes Gaussian. All that is 
required is that the effective duration of the over-all impulse response of 
the pair of filters in cascade should again be sufficiently long. 

The preceding arguments suggest that the output of any linear filter is a 
Gaussian process whenever its input is a Gaussian process. Although formal 
proof of this fact is mathematically involved, the observation that the 
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input and output processes x(t) and y(t) in Fig. 3.20 are related by 

y(t) = x(a) h 2 (t — a) da (3.102) 

J — CO 

provides further substantiation. Approximating this integral by a sum, 
yit) tu ^ afo) h(t - oq) Aoc* 

i 

we note that the conclusion that y(t) is Gaussian is consistent with the 
property that a weighted sum of Gaussian random variables is Gaussian. 

As mentioned in connection with Eq. 3.10, Eq. 3.102 is an example of 
specifying a new random process by means of applying a stated operation 
[convolution with h 2 (t)\ to the sample functions of a given process. The 
relative mathematical ease with which Gaussian noise can be handled in 
communication problems stems from the fact that a Gaussian input to a 
linear filter yields a Gaussian output. This, of course, is not true for non- 
Gaussian inputs. 

3.5 CORRELATION FUNCTIONS AND POWER SPECTRA 

We have seen that the random process at the output of a linear filter is 
Gaussian whenever the input is Gaussian. Since any Gaussian process is 
specified by its mean and correlation functions, the effect of the linear 
filter on a Gaussian input is described com- 
pletely by the effect of the linear filter on the 
mean and correlation functions. We now 
consider how to calculate these functions; \z(u>,t ) f {?(<«>.*) f 

the results are valid whether or not the input „. , , 

r Figure 3.21 The random proc- 

process is Gaussian. ess w jth sample functions 

/)} results from passing 

The Expectation of an Integral the random process z(t) through 

the linear filter hit). 

In Fig. 3.21 we show a linear filter h(t) 
whose input is an arbitrary random process z(t). The sample functions 
of the random process y(t) at the filter output are related to the sample 
functions of z{t) by the convolution integral 

y{<x>, t) = z(o), a) h(t — a) da; all to in 0.. (3.103) 

J — CO 

From Eq. 3.93 the mean function of y(t) is 

mff) = E[y(0] = E z(a) h(t — a) dal. (3.104) 

LJ-oo j 
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Simplification of Eq. 3.104 is straightforward when the number of. 
points in the sample space £2 is finite. Let us assume that there are k 
points {<*>*}, i = 1, 2, . . . , k, to each of which is assigned probability P v 
Then 


m u (t) = 2 p i 0 = 2 *(<*><» a ) h (* ~ a) da. (3.105a) 

2=1 2=1 J —CO 

By interchanging the order of summation and integration in Eq. 3.105a, we 
obtain 


V(0 = j 1 p i z ( (0 n «) h ( f ~ «) da 

J — CO 2=1 


= z(<x) fe(t — a) da. 


(3.105b) 


Under these conditions the autocorrelation function of ?/(/) may be 
obtained by a similar procedure : 

s ) = W)W) 


= e|J z(a) h(t — a) daj z(/3) /*(•* — /?) dp J 
= j, p i I «(<*>* a) A(f — a) f z(w <5 p) h(s — ft) dp 

2- 1 J-co J-W 

= 2 f f z ( w o a) z (co f , i?) A(f - a) h(s - P) da dp. 

i=l J-coJ-co 

(3.106a) 

Again interchanging the order of finite summation and integration, we have 

&„(f, s) = ( f ^ p i *(«><> a ) z ( c °i> P) h( d ~ a ) — P) da d P 

J _ CO J — 00 Ll— 1 


= f | z(a) z(P) h(t — a) h(s — P) da dp. (3.106b) 

J— CO J — CO 


le mathematical issues involved in 


tion and expectation become sensitive whe n the sample. space ...becomes: 
infinite. Both the interchange and the resulting input-output relations 


m y {t) — m z (a) h(t — a) da, 


(3.107) 


3iy(t, S ) = 


3l 2 (a, P) h(t — a) h(s — p).da dp, (3.108) 


remain valid, however, whenever the double integral of Eq. 3.108 is finite 
for' all t and 5 . 23,6S 
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For neither of these equations do we require that z(t) be Gaussian. 
When z(t ) — hence y(t ) — is Gaussian, however, evaluation of these two 
integrals completely specifies the process y(t). 


Power Spectrum 

Important additional insight into the effect of filtering a random proc- 
ess z(i), which again need not necessarily be Gaussian, can be gained from 
Eq. 3.108 in the special case in which 3i z (t, s) depends only on the interval 
( t — s) between the sampling instants t and s. In particular, if this con- 
dition is satisfied, we shall find it possible to investigate the distribution 
of mean power in z(t) as a function of frequency. Accordingly, in the rest 
of this section we shall assume that 

% z (t, s ) = % Z {T), (3.109) 

where 

r = t - s 

and the notation is that of Eq. 3.100. 

Equation 3.109, when substituted in Eq. 3.108, implies that 

‘Ji.pt, s) = f f 3i z ( a - p) h(t - a) h(s - p) da dp. 

J — CO J ~~ CO 

Making the change of variables v — t — a, y. — s — P, we obtain 


31,//, s) = f 3l z (/ - s + — v) h(y) h(y ) dy dv. (3.110a) 

J— co v — co 

Since the right-hand side of this equation depends only on (/ — s), we see 
that whenever 31//, s ) is a function only of r — t — s, so also is 31 v (t, s ): 

31//, s ) = 3t/r) => dipt, s) = 31„(t). (3.110b) 

Equation 3.1 10a can be simplified if we introduce the Fourier transforms 
of 3l/r) and 31 /t), say §,,(/) and S z (f): 


§„(/) = f” dr, 

J— CO 

(3.111a) 

w) = r *.<')'■*" dr ■ 

— 00 

(3.111b) 

It follows by inverse transformation that 


3l,(r) = f” S,(/) e +|s ’" df 

(3.112a) 

' v — CO 

and 


r oo 

%.(?) = S s (/)e +l2 ’" df. 

J —CO 

(3.112b) 
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I 

When r is substituted for t — s and Eq. 3.112b is used to express 
31 2 (t -j- p, — v) in terms of S 2 (/), Eq. 3.1 10a becomes 

l*co r <x> fco 

3t„(r) = S ,(f)e+ a ' Hr **-*hWh<?)dfdii.dv 

%} ' — CO — CO J — CO * 

r co fco r«» 

= SX/V 2 ^ d/| h(p)e M dp I /t(r)e- ,2,r/v 

The integral on v is recognized as the filter’s transfer function H(f), and 
the integral on p is recognized as Thus 

31/t) = r S z (/) df, (3.113) 

J —co 

and, comparing Eq. 3.113 with Eq. 3.112a, we have 

§,(/) = §*(/) I# (/)l 2 - (3-114) 

We may interpret Eq. 3.114 as follows. First, we note that the mean 
square value of the filter output process y(t) is independent of time 
whenever Eq. 3.109 is satisfied: 

yHf) = t) - 3i v (t - 0 = 31,(0). (3.115) 

Next, we consider y(t) as an ensemble of voltage or current waveforms 
applied across a l-£2 resistor, so that y 2 (o>, t) is the instantaneous power 
dissipated in the resistor at time t by the waveform associated with sample 
point to. We therefore interpret 31/0) as the expected value of the power 
dissipated in the resistor at any instant. 

From Eqs. 3. Il2a and 3.114 we have 

Coo fco 

3U0) - §,(/) df = S JJ) 1H(/)| 2 df. (3.116) 

J — 00 v — 00 

If we now let H (/) be the particular filter, shown in Fig. 3.22, for which 

H(f) = 

we obtain 

31,(0) = T\(/) df + f\(/) df. (3.117b) 

J-h Jf i 

We shall soon see that S //) is always an even function of frequency. 
Since Eq. 3.117 implies that the mean power delivered by z(t) in any 
narrow frequency band of width A f centered on f is approximately 
2 Sff) A/, as shown in Fig. 3.23, S //) describes the distribution of 


1; for/i < I /I </ 2 , 
0; elsewhere, 


(3.117a) 
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Figure 3.22 Ideal bandpass fitter. Although not physically realizable (the impulse 
response is not identically zero for t < 0), ideal rectangular filters are useful for 
purposes of analysis. 


mean power with frequency in the process z(t). For this reason S 2 (/) is 
called the power density function of z{t). 

Wide sense stationariness. It is essential in the derivation of Eq. 3. 1 17b 
that 3l 2 (/, s) = 31 ft — s ) ; if this condition is not met, the Fourier trans- 
formation of Eq. 3.111 cannot be made and the power density function 
S z (f) is not defined. 

Since knowledge of S„ (/) at the output of a linear filter implies knowl- 
edge of 3t/r), Eq. 3.114 (together with the relation between the mean 
functions given b^'^q- ~^TD7) complelelyde scribVsJhQ effect 'of a'Tmear 
filter on T ^ process' is not 

the case, although Eqs. 3. 11 5 and 3.1 16 still permit us to calculate the total 
mean square instantaneous power out of the filter. The ability to do this 
and to talk about the power density of a random process is sufficiently 
important in its own right that processes z(t) which meet the conditions 

mff) = constant (3.118a) 

31//, s) = 31 ft - s) (3.118b) 



Figure 3.23 The mean power delivered by z(t) in a frequency band of width A f centered 
on/' is equal to the shaded area. 
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are given the special name wide-sense stationary .f Stationariness, as we 
defined it in Section 3.1, is often called strict-sense stationariness in order 
to avoid possible confusion. 

Any strict-sense stationary process is wide-sense stationary, but the 
converse is not true. The process of Eqs. 3.101 is a counterexample. A 
wide-sense stationary Gaussian process is also strict-sense stationary, since 
all of the conditions of Eqs. 3.100 are met. 

Properties of S 2 (/) and % z (t). Since the power density function S 2 (/) 
of a wide-sense stationary random process z{t) is the Fourier transform of; 
the correlation function 3i z (r), the properties of the two functions are 
intimately related. First, we note that 3l z (r) is a real, even function of t: 

% z (~t) = 3i x (r). (3-119) 

This follows from the definition of Eq. 3.97 ; z(t) is real, and 
3i z (r) = % z (t - s) = z(t) z(s) 

= z{s ) 2(0 = 3l z (s — t) — 31 z (—t). 

Equation 3.119 implies that S //) is a real, even function of/. We 
prove this by observing that, since 31 2 (t) is even and sin 2vfr is odd, 

| 3t/r) sin 2tt/t dr = 0. 

J— 00 

But 

f” % x {f)e~ ]MT dr =\ y\(r)(cos 2i t/t - j sin 2 tt/t) dr, 

J— co 

hence 

S z (/) = ( 3i z (r) COS 2nfr dr. (3.120) 

J — 00 

Since the right-hand side of Eq. 3.120 is an even function off, the proof is 
complete. 

Next, we claim that S z (/) must also be a non-negative function: 

S z (/)>0; for all/. (3.121) 

This is clearly a necessary condition for the interpretation of S 2 (/) as 
power density to be meaningful. Proof follows by noting that if Eq. 3.121 
were not true an / and / 2 could be chosen for the rectangular filter in 
Fig. 3.24 such that 


S z (f)df<0. 


(3.122) 


f In many texts, processes satisfying only Eq. 3.118b are called wide-sense stationary. 



POWER SPECTRUM 185 


Figure 3.24 Proof (by contradiction) that a power density function cannot be negative. 



But, from Eq. 3.117 and the evenness of S .,(/), this integral is one half the 
expected value of the square of the filter output y(t) and thus Eq. 3.122 
would be in contradiction to the fact that y\t) must be non-negative. 

The fact that S z (/) is non-negative does not imply that 3i z (r) is also 
non-negative. It does imply that the correlation function of any wide- 
sense stationary process z(t) satisfies the inequality 

|3l,(r)l <3l z (0); for all r, (3.123) 

since ^ 

|31*(t)| = | p S z (f)e™ r df 

I J — CO 

< P le i2 "'l df 

J—co 

= [” S.(f)df=9lM- 

J—m 

Equation 3.123 permits interpretation of the conditions under which the 
filter input-output relations of Eqs. 3.107 and 3.108 are valid. For wide- 
sense stationary processes, 

I Xft, 5)| = | % y {t - 5)| < 51/0) = yHfj, 


so that requiring the double integral of Eq. 3.108 to be finite is equivalent 
to requiring that the mean power of the output process be finite for all t. 
It can be shown that this requirement suffices even when y(t) is not 
stationary. 
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Jointly Gaussian Processes 

We have already emphasized that one of the most important properties 
of joint Gaussian random variables is that new random variables obtained 
as a result of linear operations performed thereon are also jointly Gaussian. 
As we have seen, one application is that samples taken from the output 
y{f) of a linear filter whose input x(t ) is a Gaussian process are always jointly 
Gaussian, which in turn implies that y(t) is also a Gaussian process. 

A second application concerns the situation in which x(t) is the input to 
two (or more) linear filters connected in parallel, as shown in Fig. 3.25. 
Consider the vector of samples 

w = (yih), y(h), . y(t k ), z(sj), 2 ( 52 ), . . . , 2 ( 5 ,)) (3.124) 

obtained by observing the output y(t) of the first filter at times {<*} and 



Figure 3.25 If x(t) is a Gaussian process, then the processes y(t) and z(t) are jointly 
Gaussian. 

the output z{t ) of the second filter at times {sj). Since w results from linear 
operations on x(t), w is Gaussian for any (fj and any {j,}. The statement 
remains true if w results from sampling N rather than just two filters con- 
nected in parallel. We call N processes jointly Gaussian if every vector 
such as w formed from these processes is jointly Gaussian. 

Two jointly Gaussian processes y{t) and z{t) are individually specified 
whenever their mean and correlation functions are known. In order to 
specify the joint density function of vectors such as w, however, we must 
know the covariances associated with every pair of components. Thus, if 
y(t) and z{t) are to be jointly specified, we must also know the covariance 

Efe/(f|) *(«*)] ” m v(Q (3.125) 

for any pair of observation instants (?*, sj). The additional knowledge 
that we need is embodied in the function 

&„„(*, s) = E[y(0 z(s)]; all t and s, 



(3.126) 
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which is called the crosscorrelation function of the processes y(t) and z{t). 

For the case illustrated by Fig. 3.25, the crosscorrelation function is 
readily obtained from x(t) and the two filter impulse responses hfj) and 
Kit). 


3i yz (t, s) = y(t) 2 (s) = E x(t- a) h y (a) da x(s - /S) h z (f) df 


x(t — a) .x(s — />) h y (a) h z ((3) da 


3i x (t — a, s — ft) hy( a) h z (jH) da df. 


(3.127a) 


If x(t) is stationary, this simplifies to 


3i vz (t, s ) = 


31*0 - s + p -a) h y (a) h z Q 5) da df 


-r r f * 

J — CO j — 00 j'-co 


= h,(f})e+> M » d/S. 

J-ot cLy J — 00 J — co 

(3.127b) 

Recognizing the integrals on a and ft as II V ( f) and respectively, 

we have 

3i„(t - s) = f” S .(/) H,(f) df. (3.128) 

J— 00 

Equation 3.128 is our desired result. Since 3l yz (t, s), as well as 3l„(f, 5 ) 
and 3t fit, s), depends only on (t — s) when x{t ) is stationary, we observe 
that the density function of any vector such as w is independent of time 
origin whenever the input x(t) is a stationary Gaussian process. In thi_s_ 
case y(t) and z(t) are c alled “jointl y stationary,”! 

An important particular case occurs when H y (f) and H z (f) are non- 
overlapping, as shown in Fig. 3.26. Equation 3.128 then states that 


31 (f - s) = 0; for all t and s. 


(3.129) 


In addition, x(t) stationary implies rnfif) a constant, so that mfj) m z (t) — 0. 
(At least one of the filters must have zero response to a constant (dc) 
input if they are nonoverlapping.) Thus any covariance involving both 


f If x{t) is stationary only in the wide sense, y(t) and z(j) are called “jointly wide-sense 
stationary.” 
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Figure 3.26 The filters H v (f) and H z (f) are disjoint in frequency. 


y(t) and z(t), as in Eq. 3.125, must be zero. For x(t) Gaussian as well as 
stationary it follows that 

Pw — Py(.Q,z(.s) ~ Py(X)Pz(s) (3.130) 

for any vectors 

y(t) = y(h), • ■ , */('*)) 

z(s) = (z(s a ), z(s 2 ), . . . , zfo))- 

When Eq. 3.130 is satisfied for all {t ( } and {$,}, we say that the processes 
y(t) and z(t) are statistically independent. 

White Gaussian Noise 

When dealing with a Gaussian process, say x(t), it is frequently con- 
venient to decompose the process into the sum of its mean function and a 
zero-mean noise term, say n(t). Thus we let 

x(t) = mjf) + n(t), (3.131a) 

where n(t) is a Gaussian process with zero mean: 

n(t) = x(t) — mjt) — 0; for all t. (3.131b) 

In most applications of interest, such as the shot noise of Eq. 3.26, the 
mean function mjt) represents a known (nonrandom) signal t erm , and 
the Gaussian noise process n{t) is (strict-sense) stationary. Since n{t) = 0, 
the covariance function g Jt, s) is then (from Eq. 3.98) equal to the 
correlation function: 

STJt, s) = K(t, s) = 31„(t); t = t-s. (3.131c) 

Thus the Fourier transform of 3l„(r), that is, the power density function 
§ n (/), completely specifies the zero mean process n{t). 

In many communication applications we are confronted with physical 
noise sources in which the Gaussian noise added onto the desired signal 
has a power spectrum that is essentially flat up to frequencies much 
higher than those that are significant in the signal itself. In such cases 
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Eqs. 3.1 15 and 3.1 16 imply that the mean square value of the noise inter- 
ference can be reduced (without adversely affecting the desired signal) by 
passing the sum of signal and noise through a filter H{f) that passes the 
signal without important change but eliminates much of the noise, as 
shown in Fig. 3.27. Insofar as the power spectrum of the noise at the 
filter output is concerned, it makes little difference precisely how the input- 
noise power spectrum approaches zero outside the passband of H(f). 
Accordingly, one frequently assumes that this input spectrum is flat for 
all frequencies and introduces the concept of white Gaussian noise. 


Signal plus 
input noise 


H(f) 


Signal plus 
output noise 



Figure 3.27 Wideband Gaussian noise at the input to a narrow band filter. The filter 
output is substantially the same as it would be if the input noise were white and Gaussian. 


denoted njj) and defined as a stationary, zero-mean Gaussian process 
with power spectrumf 

§„(/) = y; -<x></<co. (3.132) 


Actually, white noise (whether Gaussian or not) must be fictitious 
because its total mean power would be 



K(f)df= co, 


(3.133a) 


which is not meaningful. The utility of the concept of white noise derives 
from the fact that such a noise, when passed through a linear filter for 
which 

P \H(f)\ 2 df< co, (3.133b) 

J — co 


produces at the filter output a stationary, zero-mean noise n(t) that is 

f The dimensions of N 0 are watts per cycle per second, or joules. We shall always 
define power spectra on a bilateral frequency basis, — co </< co. 
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meaningful. From Eqs. 3.114 and 3.132 we have 


»„(/) =-f\H(f)\\ 


and thus 




\H(f)I z df, 


(3.134a) 


(3.134b) 


which, by Eq. 3. 133b, is finite. The correlation function at the output, from 
Eqs. 3.120 and 3.134a, is 

\r> f CO 

K(r) = V l^</)| 2 cos d f ■ (3-135) 

2 J — CO 


An alternative derivation of Eq. 3.135 follows directly from the cor- 
relation function of white noise. We note that 

jvp r°° 

$»(/) = ^°= ~ S(r)e-“”' dr. (3.136a) 

2 J— co 2 

Thus, in accordance with Eq. 3.111, we ascribe to n w (t) the correlation 
function 


= y < 5(r), 


(3.136b) 


which is again a nonphysical but useful result. Equation 3.136b implies 
that any two samples of white Gaussian noise, no matter how closely 
together in time they are taken, are statistically independent. In a sense, 
white Gaussian noise represents the ultimate in “randomness.” Sub- 
stituting Eq. 3.136b in Eq. 3.110a, with t — s = t, we have 

fp f® 

= — I <5(r + - v) h(fx) h(v) dfi dv 

Z J — 06 J — CO 


(3.137) 


= — h(v — t) h(v ) dv. 
2 J —co 


Expressing h(v ) as the inverse Fourier transform of H(f) and interchanging 
the order of integration again leads to Eq. 3.135. The integral in Eq. 3.137 
is frequently referred to as the “correlation function” of the (deter- 
ministic) function h{t). 

As an example of the application of these results, consider the ideal 
lowpass filter shown in Fig. 3.28, whose transfer function is given by 


W(f)± 


i; I f\<w, 
0; elsewhere. 


(3.138) 
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Figure 3.28 White noise into an ideal lowpass filter. 


When the input to this filter is white Gaussian noise, njj), the mean 
function m n (t) of the output n{t) is 


m n (t) = I n w { a) h{t — a) da. 

J— CO 


But, from the definition of njt). 


so that 


/7,„(a) = 0; for all a, 

m n {t) — 0; for all t. 


(3.139) 


The correlation and covariance functions at the output, from Eqs. 3.131c 
and 3.135, are 


&n<j) = - S.(/) I W)| 2 cos 2tt/t df 


IV’ C JV 

= — 2 cos 2tt/t df 

2 J-w 

= J\^q sin 2tt/t | Tr = 


sin 27 tW't 


2 2ttt I -if 


(3.140) 


Hence 


£> n (0) = 31„(0) = n\t) = 1WV 0 ; for all t. 


Now consider k samples {«<} taken from the output process n{t) at the 
time instants {/*•} given by 


t t = — + T; i = l,2,...,k, 
2W 


(3.141a) 
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where T is any constant. It is interesting to note that the {«*} are statisti- 
cally independent with zero mean and variance WJC . 0 : 

n { = n(ti) = m n (t t ) = 0, (3.141b) 


%ij = *n<k - U) = ) 

= mr ,^L=JlJ^ "*'->■ (3.141c) 

■n(i — J) (0; otherwise. 

Thus the density function of the k Gaussian random variables (nj is 

Pn(a) = (2 exp ( “ mF 0 i?i ai ) ' 

APPENDIX 3A MATRIX NOTATION 

Matrix notation simplifies dealing with linear transformations. Con- 
sider, for example, the set of linear equations 

Vi = a n x i + a n x 2 H b Vk + rn x 

y-i — #21 X 1 + #22*2 + ■ ' ■ + a 2k x k + m 2 


Vk = a n x i + <**&» 4 1- a kk x k + m k- (3A.1) 

We may say that the variables {rrj, i = 1,2 , ,k, are linearly trans- 
formed into the new variables [yf$, j — 1,2, ... , k. In matrix notation 
these equations would be written more concisely as 

y T = Ax T + m T . (3A.2) 

Definitions 

In order to give explicit meaning to Eq. 3A.2, several definitions are 
necessary. 

I. An (n x k) matrix B is defined as an n-row, ^-column array of 
numbers such as 

/b-,-1 bio ■ * ■ bij.\ 


bo 1 bn 


(3 A. 3) 


p n i b n 2 ' ■ ' b nJe 
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2. The (i,j)th element, b i} , of a matrix B is the number that is located at 
the intersection of the z'th row and the yth column. 

3. The transpose of an (n x k) matrix B is the ( k X n) matrix, denoted 
B t , obtained by interchanging the rows and columns of B. An equivalent 
statement is that the (i,y)th element of B T is the (j, i)th element of B. The 
transpose of matrix B in Eq. 3A.3 is 

I bn b 21 • * • b n i\ 

^12 b 2 z 

B t - 

\^ifc b 2h • • • b n1c J 

4. We call a 1 x k (single-row) matrix a vector. For example, 

z = (*!, z 2 , ... , z k y (3A.5a) 

The transpose of a row matrix z, denoted z T , is a k x 1 (single-column) 
matrix, 

H 
Z 2 


5. Two matrices are said to be equal if and only if every pair of cor- 
responding elements is equal. Thus the equation A — B implies 

a it = b i} ; for all i and j. (3A.6) 

6. The sum [A -)- B] of two ( n x k) matrices A and B is the new 
( n x k) matrix C whose elements are given by 

Ca = a i} -f b^, for all i and j. (3A.7a) 

Thus 

C = A + B (3A.7b) 

if and only if Eq. 3A.7a is satisfied. Matrix addition, like arithmetic 
addition, is associative and commutative. The sum of two matrices that 
do not have the same dimensions in X k) is not defined. 
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7. The scalar product of an (n x k ) matrix A by a constant c is the new 
( n x k ) matrix B whose elements are given by 

b i} — ca u ; for all i and j. (3A.8a) 

Thus 

B = (A (3A.8b) 

if and only if Eq. 3A.8a is satisfied. Scalar multiplication is associative 
and commutative. 

8. The matrix product AB of an (n x k) matrix A by a (k x m) matrix 
B is the new ( n x m) matrix C whose elements are given by 

fc 

c i} = 2 a n b u'> for a11 1 and J‘ (3A.9a) 

Thus 

C = AB (3A.9b) 

if and only if Eq. 3A.9a is satisfied. If the number of columns in the first 
matrix, A, is not equal to the number of rows in the second matrix, 
B, the two matrices are said to be nonconformable and the product AB is 
not defined. Thus the matrix product of two vectors is not defined; but 
the matrix product of a /c-component vector x and a fc-component trans- 
posed vector y T is identical to the vector dot product of x and y: 

xy T = 2 x iVi = x • y. (3 A. 10) 

i 

Equation 3 A. 10 is an important relation which we shall use frequently. 
It is immediately helpful in visualizing the meaning of Eq. 3A.9a. As 
shown in Fig. 3A.1, we can think of c t} as the dot product of the vector 
a ; . that corresponds to the ith row of A and the vector b. 3 whose transpose 
corresponds to the y'th column of B. Thus 

(3A.11) 

The notation a £ . and b.* is mnemonic in that the dots indicate indices 
ranging over the dimension of the vector; for a Ar-column matrix A and 
an n-row matrix B, a 2 -. = (a iV a i2 , . . . , a ik ) and b.^ = {b 1} , b 2i , . . . , 
Equation 3A.11 may be visualized in terms of picking up the y'th column 
of B, laying it horizontally over the ith row of A, multiplying the super- 
imposed numbers by pairs, and summing the products. As an example, 



/ 15 20 26 14' 
\24 36 30 26, 
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Cu «'a<. • b.j 


! 


= (fl n bu + u i2 b 2l + • • ■ + 


Figure 3.A.1 Matrix multiplication. The matrix C = AB is an n X m matrix with 
elements {c,d that may be obtained as shown. 





which may be readily verified by inspection; for instance, the (2,3) 
element in the product is computed 


a 2 . = (2 0 6)' 
b .3 = (6 2 3), 


c 23 = 12 + 0 + 18 = 30. 


The foregoing definitions are sufficient to explain the meaning of 
Eq. 3A.2. We take the matrix A to be the square ( k x k) matrix whose 
elements are the coefficients {a ti } in Eq. 3A.1 and let 



(3A.12) 


The product Ax T is therefore a (k X 1), column matrix, and equating 
corresponding elements on the right- and left-hand sides of Eq. 3A.2 
reproduces the set of equations in Eq. 3A.1. 

Matrix notation is especially helpful when one is confronted with a 
sequence of linear transformations. For example, if the k variables 


3 





196 RANDOM WAVEFORMS 


y == (.Vi> 2/a> • • • > Vk) i n Eq. 3A.1 are subsequently transformed into l new 
variables z = (z lt z 2 z t ) by means of a linear transformation 



then 

z T = By T . (3A.13b) 

If we wish to find the z’s in terms of the x's, we substitute Eq. 3A.13b in 
Eq. 3A.2 and obtain 

z T = B[Ax T + m T J. (3A.13c) 

Properties of Matrix Multiplication 

The definition of matrix multiplication (Eq. 3A.9) implies certain prop- 
erties that are important. 

1. Matrix multiplication and addition are distributive-, that is, 

A(B + C) = AB + AC. • (3 A. 14) 

This can be verified directly from the definition. 

2. Matrix multiplication is associative ; that is, 

(AB)C = A(BC). (3A.15) 

This can be verified, with some labor, by showing that the (/,/)t h element, 
say d ijt of the triple product is given by 

(3 A. 16) 

l m 

regardless of which multiplication is carried out first.f 

3. Matrix multiplication is not generally commutative ; that is, 

AB ^ BA. (3 A. 17) 

Indeed, two matrices conformable in one order need not be conformable 

t Use of these first two properties permits us to simplify Eq. 3A.13c still further: we 
may write z T = Cx T -f n T , where C = AB and n T = Bm T . Thus a sequence of linear 
transformations is a linear transformation. 
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in the other. Even in the case of two ( k x k) matrices, however, multi- 
plication is not usually commutative. For example. 


1 0\/l 1 


1 0 / \0 0 


1 1\ /I 0 


0 0/ VI 0 


4. The transpose of a matrix product is the commuted product of the 
transposes; that is 

(AB) t = B t A t . (3A.18) 

This property is easily proved. First consider the left-hand side of 
Eq. 3A.18. The (/,/) th element of (AB) T is the (/,/) th element of AB, 
which by Eq. 3A.11 is a j. • b. £ . Next consider the right-hand side: the ith 
row of B t is the /th column of B, and the /th column of A T is the /th row 
of A. Hence the (/,/)th element of B T A T is also a^. • fa.,. As an example, 

r/1 2\/5 6\1 T /I9 22 \ t 


3 4/ \7 8 


43 50 / 
'19 43' 
,22 50 


5 7\/l 


6 8/\2 4 


5. A number b equal to a double sum of the form 

•k k 

b = J, 'Z* i a ij y j = + x x a u y 2 + ■ • • + 

i-1 i- 1 

+ x 2 a 2iVi + x 2 a 22 y 2 + h ^a 2k y k 


+ x k a kiVi + x k«k 2 V 2 + ■ ’ ' + x k a kkVk (3 A. 19a) 


can be written succinctly 


b = xAy T - 2 


(3A.19b) 


where x = (x x , » 2 , . . . , »*), y = (y x , y 2 , . . . , y*)» and A is the ( k x k ) 
matrix with elements {a i} }. This type of sum is called a bilinear form. 
When y = x, it is called a quadratic form. Expressions of this kind are 


useful in Section 3.3. 


Inverse Matrices 

The last matrix concept we shall consider is that of an inverse. The 
inverse of a square ( k X k) matrix A is written A -1 and is also a (k x k) 
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matrix. If the set of k equations 

y T = Ax T (3A.20a) 

can be solved uniquely for the k afs in terms of the k y" s, A -1 is the matrix 
of coefficients in the resulting equations. Thus, if Eq. 3A.20a implies that 

x T = By T , (3A.20b) 

then 

B = A" 1 . (3A.20c) 


Combining Eqs. 3A.20c with Eqs. 3A.20a and 3A.20b, 

we have both 


y T = A(By T ) = (AA-i)y T 

(3A.21a) 

and 


x T = B(Ax t ) = (A~ 1 A)x t . 

(3 A. 21b) 

It follows that 


AA- 1 = A-*A - I, 

(3A.22) 


where I is the diagonal matrix 

. \ 

1 = I 1 (3A.23) 

\ 0 J 

in which all off-diagonal elements are zero (symbolized by the large 0’s), 
and all principal diagonal elements (of the form c u ) are unity. 

The matrix I is called the identity matrix and has the property that it 
transforms any matrix into itself : 

Cl = IC = C. (3 A. 24) 

Equation 3A.22 is taken to be the definition of inverse : the matrix A -1 
inverse to A is that matrix which, when premultiplied or postmultiplied 
by A, yields the identity matrix. It is clear from the definition that 

(A-i)-i = A. (3A.25) 

When matrix A does not correspond to a reversible transformation 
(that is, if the afs in Eq. 3A.20a cannot be uniquely determined from 
knowledge of the y’s and vice versa), the matrix inverse to A is not defined 
and A is called singular. This must always be the case when A is not 
square. When A is square, it is singular whenever the simultaneous 
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equations of Eq. 3A.20a are linearly dependent, which implies that the 
determinant of A, denoted |A|, is zero: 


| Aj = 0 o A is singular. 


(3A.26) 


Otherwise, A is nonsingular and A" 1 exists. 

The elements comprising A -1 are given directly by solving Eq. 3A.20a 
to obtain Eq. 3A.20b. If B = A -1 , then we know from the elementary 
theory of determinants 37 that 


b u = (- 1 ) 


i+j l^nl 


(3A.27) 


where A H is the matrix obtained from A by deleting the jth row and zth 
column, and | A H \ is its determinant. Note that the order of the indices i and 
j is different on the two sides of Eq. 3A.27. 

As an example, the inverse of the matrix 


/I 1 0 
A= U 1 3 
\l 4 1 


|A| = -10/ 


A = -r I — 1 -1 


It can be readily verified that Eq. 3A.22 is satisfied. 

The last result we shall need is 

(AB)" 1 = B _1 A _1 . 

This follows directly from the equation 

(B-^A^XAB) = B- 1 (A" 1 A)B = B _1 B = I. 
Finally, taking B = A” 1 in Eq. 3A.18 yields 

(A - 1 ) 1 = (A V- 
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(3A.28) 


(3A.29) 


(3A.30) 


3.1 An elementary random process comprises four sample functions, to each 
of which is assigned equal probability. 

x(co 1 , t ) = 1 a;(a> 3 , 0 = sin nt, 

x(a> 2 , t ) — -2 x(ro 4 , t ) = cos Trt. 
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» a. Is the process stationary? 

« b. Calculate x(t) and x(t x ) x(f 2 ). 

. c. What is the probability of the set of sample functions passing through the 
windows of Fig. P3.1a? — Fig. P3.16? 



(a) (b) 


Figure P3.1 

3.2 Let x = (n' x> x 2 ), where % and x 2 are zero-mean Gaussian random vari- 
ables. Assume for (a) and (b) that = x 2 2 = 1 and that x x and x 2 are statisti- 
cally independent. 

■. a. Evaluate |x| 2 , [x| 2 . 

b. For each of the four accompanying figures, express the probability that x 
lies in the shaded region in terms of the function Q( a), where 

Q(«) 4 r4=e~^0. 

Ja V2tt 

c. Repeat (b) for Fig. P3.2a and b, with x x 2 = 1, x 2 l = 2, x x x 2 = — 



(b) 
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X2 x 2 



Figure P3.2 

3.3 Let x = (x x , x 2 , x 3 ) be a zero-mean Gaussian vector with covariance matrix 

( 3 3 °\ 

A,= 3 5 0 . 

\0 0 6 / 

(This is a concise way of writing x x 2 = x x x 2 = 3; x 2 2 = 5; x 3 2 = 6; x x x 3 = 
x 2 x 3 = 0 .) 

i a. Give an expression for p x . [Observe that x 3 is statistically independent of 
the pair (x lt x 2 ).] 

, b. If y = x t + 2x 2 - x 3 , determine p y . 
c. If z = (z l5 z 2 , z 3 ), determine p z , where 


z x = 5.x'i — 3 x 2 ~ a: 3 , 
z 2 = — ,-Tj + 3 ,-c 2 — x 3 , 
z 3 =x 1 + x 3 . 


d. Determine p Xi (« | x 2 = (!). 

3.4 A channel is disturbed by two zero-mean jointly Gaussian noise processes, 
n x (t) and n 2 (t). It is known that 

A — sin ttt 

3l f (T) = nfa) «,(/ — r) = ; i = 1,2, 


A SIU TTT 

&is(0 = m(t) n 2 (t - r) = — — . 


Write the joint density function of the three random variables x x , x 2 , and x 3 , 
where 

x x = >h (t) | <=0 , 

*2 = «l(0 |i=l» 

x 3 = n 2 (t) | i=0 . 
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3.5 Let x(t) and y(l) be statistically independent, stationary random processes 
and define z(t) = x(t) y(t). Is z(t) stationary? Show that 

S 2 (/) = §*(/) * S/jO, 

where, as usual, the symbol * denotes convolution. 

3.6 Let x(t), a Gaussian random process with mean function m x (l) and 
covariance function ££ x (t, s), be passed through the filter shown in Fig. P3.6. 
Is the resulting process y(t) Gaussian? What are the mean and covariance 
functions of y{t)l Is y{t) stationary if x(t) is stationary? 



Figure P3.6 


3.7 A stationary zero-mean random process is the input to three linear filters, 
as shown in Fig. P3.7. The power density spectrum of x(t) is § x (f) = J\P 0 /2, 
The filter impulse responses are 

(1; 0 <r < 1, 

K(t) - 

0; elsewhere. 

2e~ l \ 0 < t, 

0; elsewhere. 

V2sin27rr; 0 < t < 2, 

0; elsewhere. 




Figure P3.7 
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a. Determine y£t) and y?(t) for / = 1, 2, 3. 

b. Is there any pair of output processes for which yjt)y~(7j = 0 for all r? 

c. Is there any pair of output processes for which y£t) y,{s) = 0 for all /, s‘? 

3.8 A stationary Gaussian random process x(t) with mean m x (t) = m x and 
covariance function SPJj) is passed through two linear filters with impulse 
responses h(t) and^O), yielding processes y(t) and z(t) as shown in Fig. P3.8. 



Figure P3.8 


a. What is the joint density function of the random variables y 1 = y(Q and 
2 2 = 2 (/ 2 )? 

b. Evaluate y(t)x(t — t). To what does this expression reduce when #(r) is 
white noise? 

c. What conditions on h(t) and g(t) are necessary and sufficient to ensure that 
y(t) and z(t) are statistically independent? 

d. If m x — 0 and x (r) = (sin ttt) 2 /^) 2 , find the instantaneous power of 
y(t) when h(t) is an ideal filter with transfer function 

i; h <1/1 <1, 

0; elsewhere. 

3.9 A zero-mean stationary Gaussian process with spectral density S x (f) is 
the input to a linear filter whose impulse response is shown in Fig. P3.9. A 
sample, y, is taken of the output process at time T. The random variable y is 
often referred to as the T-second time average of the process x(l). 


t 


Figure P3.9 

a. Calculate y. 

b. Calculate o v * in terms of § x (f) and T. 
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c. Upper bound a y 2 under the conditions S x (f) < S for all /. 

d. Derive a tight upper bound on P[ \y - y\ > e] and contrast with the weak 
law of large numbers. 

3.10 It is desired to generate a stationary random process with the correlation 
function 

= e-M, (1) 

hence the power spectrum 

2 


Sx(/) = 

We propose doing this in two ways: 


1 + ( 2 t ff 


I. By setting x(t) = A cos (27 ft + 6), in which A, f, and 9 are statistically 
independent random variables. 

II. By taking 


x(t) = J h( a) n(t — a) da, 

in which n{t) is white noise and h(t) is some appropriate impulse response. 

a. Specify density functions for p A , p f , and p g that yield the desired power 
density spectrum of Eq. 2. Do you need to specify the density functions for 
A, f, and 0 completely or is specifying less statistical information about them 
sufficient? 

b. Pick h(t) to yield the spectrum of Eq. 2 via method II. 

c. Sketch a typical sample function generated by method I ; by method II. 
Do you expect them to look similar? Explain. 

3.11 Let x and y be statistically independent Gaussian random variables, each 
with zero mean and unit variance. Define the (Gaussian) process 

z(t ) = x COS 2nt + y sin 2nt. 

a. Determine the covariance function of z(l) and express p Zl ,z 2 in terms of 

it, where = z(t *-) for i = 1, 2. Is the process z(t) stationary? 

b. Define r = v'® 2 + y 2 , 6 = tan -1 xjy, and determine p Ti6 . Note that 

z(t ) — r sin (2 irt + 0). 

c. Consider three random variables obtained from z(/) by sampling at times 

t = 0, Determine the covariance matrix of these variables. Does the 

inverse matrix exist? Explain, Use impulse functions to write the joint proba- 
bility density function of these variables. 

3.12 Determine the correlation function 3i a (t, s) of the random process 


*(0 = 2 w i u ( ( ~ iT ~ T )> 

_CO 


PROBLEMS 205 


where the (wj and r are statistically independent random variables with 
p w .(x) = |[<5(« + 1) + 9(a - 1 )]; all i. 



0 < ct < T, 
elsewhere. 


The waveform u(t) is shown in Fig. P3.12a and a typical sample function 
appears in Fig. P3.126. 

u(t) 


t 


(a) 

Typical sample function 



lb) 

Figure P3.12 



3.13 The general expression for the mixed moments of N zero-mean jointly 
Gaussian random variables x lt x 2 , . . . , x N is 

0; L odd, 

. . . fp . —— . ^ j , 2 • L* j 1 , L even, 

x i, x in, x ir \ „ jf-.. , IL-VIS' 

1 a L all distinct 

pairs of 
^subscripts 

where, as usual for zero-mean variables, l lk = For example 

XjX^x^x^ = + 2 13 A 2 ,) + 2^4^23, 

x l x 2 x 3 = O' 

If some, of the variables appear in the moment with a power of 2 or higher, the 
formula is to be applied by, treating each repeated subscript as if it were distinct; 
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x l x ‘l x 3 = Klhz + ^12^13 + ^ 13^18 = ^11^23 + 2A 12 A 13 ; 

V = 3^2 = 3 ffi 4_ 

a. Evaluate x l z x 2 2 and x L s x 2 directly and by use of the formula. 

b. Apply the formula to x 1 x^c 3 x 4 x 6 x v 

c. Verify that the number of terms entering into the formula for x x x 2 • • • x L , L 
even, is 

a-l)(I>-3)---(3Xl)=r ^ 7 r W, - 


2 L!z (Lj2 ) ! " 


d. Using (c), evaluate x/ J and x l i x 2 i . 

e. Note that 

T~n r~N N 


/ n r N N i L ‘ 2 

= 2 J, Vi( x i x k>k ; ^ even 

\i=l / Li=it=i J 

and prove the moment formula by expanding both exp ^ j and 

/ j N N \ 

exp ( — z 2 2 v ihk v k ) in a power series. First equate terms to obtain 


/ N \L 

y 2 V i X i\ = 0; L odd 

Tlv { N N \LI 2 

II ’"j “ mm'. II I ’■ H ; i even - 

Next equate coefficients of terms such as v x v 2 ■ ■ • v L on both sides of this 
expression. 

3.14 In the circuit shown in Fig. P3.14, x(t) is a Gaussian random process with 
zero-mean and correlation function 


ft x(t, s) - 


2 sin 7T (r — 5 ) 
«(! - s) 


Find expressions [in terms of h(t) or H(f)] for m y (t) and SP v {t, s). Is y(t) 
wide-sense stationary? Hint, Use the results of Problem 3.13 and the con- 
volution <r~> multiplication theorem of Fourier analysis. 


( \ X 2 (t) 

I Cni larof I v 


1 O^UaitTJ J iP 

h{t) 


Figure P3.14 
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3.15 The process 

x(t) = 2 A ~ T * - ) 

% — — CO 

is called a “Poisson impulse train” when the {t,} are random variables so dis- 
tributed that the probability P(«, T) of exactly n impulses occurring in any 
interval I of duration T is 

(mT) n 

P («, T) = e~ mT ; n = 0, 1, 2, ... , 

independent of the number of impulses arriving during all time intervals dis- 
joint from I. We assume without loss of generality that r a - < r i+1 , all i, as 
indicated in Fig. P3.15. The parameter m is a positive constant. 

x(t) 


t 

Figure P3.15 

a. Verify that 

•f P(«, 70 = 1. 

n = 0 

b. Let I consist of two subintervals with durations and T.,, and let and 
n 2 denote the number of impulses occurring in these subintervals. Use character- 
istic functions to verify that 

P[«i + n 2 = ri] = P(«, T x + T 2 ). v 

c. Let the random variable N denote the number of impulses occurring in any 
interval of T seconds duration. Evaluate ft and <r lV 2 . 

d. Define the random variable 

h = T i+i “ T i- 

Thus li is the length of time between the occurrence of the rth and (/ + l)st 
impulse. Determine l u l { 2 , and the probability density function p t .. Hint. First 
determine P[/ 2 - < a]. 

e. Repeat (d) for the random variable 

i A 

Ik = T i+k ~~ r i > 

in which k is an arbitrary positive integer. 
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3.16 The Poisson impulse train x(t) of Problem 3.15 is applied as input to a 
linear filter, as shown in Fig. P3.16a. When h(t ) is chosen as shown in Fig. 
P3.166, the output process is 

2/(0 = A X -tJ = -r N a , 

i= —co 

in which N A is the number of impulses occurring in the interval [t — A, /]. 


h(t) 



(a) (b) 


-*i 1! r ; 2 lr i 3 r 

i 1 1 2 i 3 1 ' 

(h — A) (t 2 - A) ii h 


(<1 ~ 


Figure P3.16 


h (t 2 - A) t 2 

M > A 
(d) 


a. Show that for this h(t ) 


2/(4) 2/(4) = ^v( r ) = 


■ (N, + N 2 )(N 2 + N 3 ); H<A, 


A 2 


M > A, 


in which t = t 2 - 4 and (i = 1,2, 3, 4, 5) is the number of impulses occurring 
in the corresponding interval 4 shown in Fig. P3.15c and d. 

b. Use the results of Problem 3.15c to reduce the expression for 3l y (r) to 
the form 


H<A, 

lyu 2 ; M > A. 

c. Observe that the process y{t) tends to the process x(f) as A^O and 
verify that 

§ x (f) = mA 2 + m 2 A 2 8(f). 

Prove as a consequence that, for a general filter h(t), 

S v (f) - mA 2 \H(f)\ 2 + [mA H(0)f 8(f). 

This result, known as Campbell’s theorem, is exploited in (d) and (e). 
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d. The process x(t), with A specialized to the electron charge q, is a good 
model for the emission of electrons from the cathode of a vacuum diode as long 
as the electrons do not interact with each other, which is a reasonable approxima- 
tion with temperature-limited operation. Let qh(t), withj* h(t) dt = H( 0) = 1, 

denote the plate current increment due to a single electron emitted at / = 0. 
Then y(t) is the total diode current. 

Let / dc denote the dc current and show that 

2/(0 = 4c + «(0. 

where n(t) is a zero-mean noise process with power spectrum 

§«(/) =?14d 

over the range of f for which H(f), the Fourier transform of h(t), is approxi- 
mately constant. Argue that an implication is that the noise in a vacuum tube 
amplifier is not strictly signal independent except in the limit of arbitrarily small 
signal dynamic range. 

e. Both forward and reverse currents tend to flow simultaneously across the 
diffusion layer of a solid-state diode (or transistor) when bias is applied. In 
normal operation the forward current is composed chiefly of one type of carrier 
(say holes) and the reverse current of the other (say electrons). The Poisson 
impulse train x(t) provides a good model for both the forward and reverse 
diffusion individually , with A specialized to +q and —q, respectively. The 
resulting terminal current may be written 

y(t) = I, - I T + n(t), 

in which I f and J r denote the dc value of forward and reverse current and n(t ) 
is again a zero-mean noise process. Assume that the forward and reverse 
diffusion processes are statistically independent and show that 

S n (f) =?(I41+14I) 

over the range of f for which the Fourier transform q H(f) of the diode’s response 
to the diffusion of a single charge-carrier at t = 0 is approximately constant. 

3.17 Determine the expression for Campbell’s theorem (cf. Problem 3.16) for 
the case, illustrated in Fig. P3.17, in which the filter input 

x(t) ='2,A i 8(t - t { ) 


x(t) 





Figure P3.17 
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is a Poisson impulse train with random amplitudes {At}. Assume the {A^} are 
identically distributed random variables, with mean A and second moment A 2 ,~ 
which are statistically independent of each other and of the {r t -}. Hint. Show 
that 

2 /O 1 ) y(t a) = A 2 E 2 h (h - r i) Kh ~ T l 

_ i 

+ A 2 E 2 2 ~ T i> *(*» 

_ 2 

3.18 Property 4 on p. 156 states that every weighted linear sum of jointly 
Gaussian random variables is a Gaussian random variable. 

a. Prove the converse statement that if 

k 

y =2 °i X i’ 

<■= 1 

is a Gaussian random variable for every (nonzero) constant vector a = 
(#!, a 2 , . . . , a k ), the {® 4 } are jointly Gaussian. Hint. Calculate the joint charac- 
teristic function of the (a: f ) by noting that 

M x (v) - M v (l) | a=v = e-'A<jy 

and compare with Eq. 3.76 after evaluating a 2 and y. 

b. The converse statement may be taken as an alternate definition of jointly 
Gaussian random variables. Prove properties 2 and 4 (p. 156) directly from this 
definition without recourse to the multivariate characteristic function. Observe 
that with this alternate definition the multivariate central limit theorem can be 
reduced to a single- variable theorem. 
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Optimum Receiver Principles 


The concepts and methods of random processes studied in Chapter 3, 
together with the a posteriori probability viewpoint of communication 
discussed in Chapter 2, provide the background necessary to treat the 
problem of optimum communication receiver design. In this chapter 
we apply this background to the particular communication system dia- 
grammed in Fig. 4.1. Here one of a discrete set of specified waveforms 


n w (t) 



JPlmilj 

Figure 4.1 Communication over an additive white Gaussian noise channel. 

i = 0, 1 , ,M — 1, is transmitted over a channel disturbed by 
the addition of white Gaussian noise, so that the received signal process is 

r{t) = s(t ) + njt). (4.1) 

Which waveform is actually transmitted depends on the random 
message input, m ; when m = m f , the transmitted signal is s t (t). Thus 
the correspondence 

m = o s(t) = s t (t) (4.2) 


defines the transmitter. The a priori probabilities specify the 

input sour ce. 

The first part of this chapter is devoted to investigating how the received 
signal r(t) should be processed in order to produce an estimate, m, of the 
transmitter input m that is optimum in the sense that the probability of 


P[S] = P[m 9 ^ m\ 


(4.3) 
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is minimum. The investigation results in the determination of the optimum 
receiver structure; that is, in the specification of what operations to per- 
form on r(t). 

In formulating the optimum receiver design problem, we assume that 
the a priori probabilities {P[mJ} and signals {sft)} are known. The 
chapter concludes with a discussion of how the minimum achievable 
probability of error depends on the choice of these a priori data. In 
particular, certain signal sets of practical importance are evaluated and 
compared. 

In Chapter 7 we extend the results of this chapter to the design and 
evaluation of optimum receivers for certain channels that disturb the 
transmitted signal in ways more complicated than by the simple addition 
of white Gaussian noise. 

4.1 BASIC APPROACH TO OPTIMUM RECEIVER DESIGN 

In Chapter 3 we have seen that the transmitted signal s(t), the disturbing 
noise njj), and the received signal r(t) in Fig. 4.1 are random processes. 
In addition, we have seen that a random process is specified in terms of 
the joint density functions that it implies. The key to analyzing com- 
munication situations such as that in Fig. 4.1 is to find some way to 
replace all waveforms by finite dimensional vectors , for which we can then 
calculate the joint density function. We show in Section 4.3 and Appendix 
4A that this replacement is permissible. As a preliminary, however, it is 
convenient first to establish the operations performed by an optimum 
receiver under the assumption that the replacement of waveforms by 
vectors has already been accomplished. 


4.2 VECTOR CHANNELS 

The iV-dimensional vector communication system diagrammed in Fig. 

4.2 is a straightforward extension of the single random variable system 
discussed in Chapter 2 in connection with Fig. 2.34. The transmitter is 
defined by a set of M signal vectors, (s f ). When m = m t , the vector s* is 
transmitted, 

s t = {s a , s iZ , ... , s iN ); i = 0,1, . . . , M — 1. (4.4a) 

The vector channel disturbs the transmission and emits a random vector 


r = 0i, r 2 , ... , r N y (4.4b) 

We consider a vector channel to be defined ... mathematically if an d only if 
the entire set of M conditional density functions fn,( I s = sh) is .Known. 
For brevity, we follow the usage of Eq. 2.104c and denote this set by p^ s . 
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For our vector communication system the optimum receiver is specified 
as follows: given that any particular vector, say r = p, is received, 
where 

P = {pi* Pz> • • • > Pa)’ (4.5) 

the optimum receiver must determine from its knowledge of (sj, and 
which one of the possible transmitter inputs (mj has maximum a 
posteriori probability. More precisely, the optimum receiver sets m = m k 
whenever 

P [m k | r = p] > P [mi | r = p]; for i = 0, 1, . . . , M — 1, i ^ k. (4.6) 

Proof that such a maximum a posteriori probability receiver is in fact 
optimum follows from noting that when the receiver sets m = m k , the 


Disturbance 



Figure 4.2 A vector communication system. 


conditional probability of a correct decision, given that r = p, is 

P[C | r = p] = P [m k | r = pj. (4.7a) 

The unconditional probability of correct decision can be written 

P[C] = r P[C | r = p] p r (p) dp. (4.7b) 

J — CO 

Since 

pXp) > o, 

it is clear that P[C] is maximized by maximizing P[C | r = p] for each 
received vector p. If two or more m t yield the same a posteriori probability, 
the receiver may select m from among them in any arbitrary way — for 
instance, by choosing the one with the smallest index — without affecting 
the probability of error. 

Determination of the a posteriori probabilities {Pj/jq | r = p]} follows 
from the mixed form of Bayes rule, Eq. 2.103a: 

P K |r=p] = P[m ‘ ]Pr (P[ mJ . (4.8a) 

Pr(P) 

Since the event m = m i implies the event s = s t -, and conversely, we have 
Pr(p | = Px(p | s = Si)- (4.8b) 
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Finally, since p T { p) is independent of the index i, we conclude from Eqs. 
4.6 and 4.8 that the optimum receiver, on observing r = p, sets m = m k 
whenever the decision function 

PM Pr(? 1 s = Sf); i = 0, 1, . . . , M - 1, (4.9) 

is maximum for i = k. 

A receiver that determines ra by maximizing only the factor p r ( p | s = s t ) 
without regard to the factor PM »s called a maximum-likelihood receiver. 
Such a receiver is often used when the a priori probabilities {PM} 
are not known. A maximum-likelihood receiver yields the minimum 
probability of error when the transmitter inputs are all equally likely. • 

Decision Regions 

The nature of the optimum vector receiver may be clarified by con- 
sidering the two-dimensional example shown in Fig. 4.3c, wherein the 
vectors are described in terms of coordinates (pi and We assume 
three possible input messages, with known a priori probabilities P[m 0 ], 
PM> and p M- The corresponding transmitted vectors are assumed to 
be 

s 0 - (1, 2), 

s x = (2, 1), (4-10) 

s 2 = (1, -2). 

If we now receive some point r — p, as shown, the receiver can calculate 
PM PXP | s = sj from knowledge of the functions p r)s which define the 
channel and thereby determine ra in accordance with the preceding 
discussion. 

We note that this calculation can be carried out for every point p in 
the M <p 2 ) plane and that each such point is thereby assigned to one and 
only one of the possible inputs {mj. Thus the decision rule of Eq. 4.9 
implies a partitioning of the entire plane into disjoint regions, say {/*}, 
i = 0, 1, 2, similar in general to those shown in Fig. 4.36. Each region 
comprises all points such that whenever the received vector r is in I k the 
optimum receiver sets ra equal to m k . The correspondence 

r in I k o m = m k (4.11) 

defines the optimum receiver. 

The regions {/,•} are called optimum decision regions and are a natural 
extension of the decision intervals considered in Fig. 2.35. We note for 
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future reference that the optimum receiver makes an error when m = m k 
if and only if r falls outside I k . 

It is clear that the concept of decision regions, which for simplicity we 
have illustrated for a two-dimensional plane, extends directly to the case 


<P2 



Figure 4.3 A three-signal vector communication problem: (a) three two-dimensional 
signal vectors and a possible received signal p; ( b ) decision regions. 

of an arbitrary number of possible inputs {raj and to corresponding 
signals {sj that are defined on an arbitrary number of dimensions. The 
decision function of Eq. 4.9 then implies a partitioning of an A-dimensional 
received signal space into M disjoint A-dimensional decision regions {/J. 
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Additive Gaussian Noise 

The actual boundaries of the decision regions in any particular case 
depend by Eq. 4.9 on the a priori probabilities the signals {sj, 

and the definition of the channel p I]s . In some instances the calculation 
of these boundaries may be simple; in most it is exceedingly difficult. 
Fortunately, many situations of practical interest fall into the simple 
category. 

To illustrate a relatively straightforward situation, consider the case in 
which the channel disturbs the signal vector (as shown in Fig. 4.4) simply 



{PM 

Figure 4.4 An TV-dimensional vector communication system. 

by adding to it a random noise vector 

n = («i, « 2 > • • • > n N y (4.12) 

The random signal vector s — (.s 1; s 2 , . . . , and received vector r are 
then related by 

r = s + n = + n lt s 2 + n 2 , • . • , 4- n jV )- (4-13) 

Since Eq. 4.13 implies that r = p when s = s< if and only if n = p - s t -, 
the conditional density functions p r[s are given by 

p t {p j s = Sj) = p n (p - s, | s = s t .); i = 0, 1, . . . , M - 1. (4.14) 

We now make the often-reasonable assumption that n and s are 
statistically independent (cf. Eq. 2.104): 

Pn\s= Pn- (4.15a) 

Hence 

p n (p - Si | s = s,) = p a (p ~ s <); all i. (4.15b) 

The decision function of Eq. 4.9 is therefore 

PM/7 n (p-s*)- ( 4 - 16 ) 

In order to simplify the decision function still more, we must specify 
the noise density function p„. An especially simple and important case 
is that in which the TV components of n are statistically independent, 
zero-mean, Gaussian random variables, each with variance c 2 . From 
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Eq. 3.57 we then have 

? " ( “ )= ( -d^ exp (-2M°4 (4 - i7a) 

The notation can be contracted by observing that the squared-length of 
any vector a is defined to be the dot product of a with itself. In the familiar 
case of TV = 2 or 3 we have 


M* = oc-a=2<x/, 


(4.17b) 


where the (aj are the Cartesian coordinates of a. For larger TV length is 
defined in the same way and Eq. 4.17b remains valid. Thus Eq. 4.17a can 
be written 


n 1 -| a) 2 / 2^ 2 

U ’ (IttoY ' 2 


(4.17c) 


Substituting Eq. 4.17c in Eq. 4.16, we see that for this p n the optimum 
receiver sets m — m k whenever 

P[m,] <?-!<>— s --l 2 /2* 2 (4.18) 

is maximum for i = k. [The factor (27rcr 2 )“' v/2 is independent of /and its 
discard entails no loss of optimality.] Finally, we note that maximizing 
the expression of Eq. 4.18 is equivalent to finding that value of i which 


minimizes 


Ip ~ sj 2 — 2a 2 In P[ mJ. 


The decision function of Eq. 4.19 is easily visualized geometrically. 
We recognize that the term |p — s f | 2 is the square of the Euclidean 
distance between the points p and sp. 

|p — s,.] 2 = — 


lenever all m, have equal a priori probabilit 


le optimum decision 


assign a receiver 


to m,. it and onl 


point S;. than to any other possible signal. F or example, consider the 
two-dimensional signal set of Eq. 4.10. If all three messages are equally 
probable, the decision regions are those shown in Fig. 4.5a; when the 
three messages have unequal a priori probabilities, the decision regions 
are modified in accordance with Eq. 4.19, as indicated in Fig. 4.56. 

Once the decision regions {/J have been determined, an expression for 
the conditional probability of correct decision follows immediately: 

P[C | m t ] = P[r in T) I m t ] = fp r (p I s = s<) dp. 


(4.20a) 
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Figure 4.5 Optimum decision regions for additive Uaussian noise . ■ 
of the {/•} are the perpendicular bisectors of the sides of the signal 
P [ Wo ] = P[m,] = P[m 2 ] ; (b) the boundaries of the {/,;} are displac 

P[m 0 ] > P['»a3- 

For additive equal-variance Gaussian noise this becomes 


P[e I m«] = J P„(P -Si) dp 
u 



(4.20b) 
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The over-all probability of error is 

ii/-i 

P[S] = 1 - P[C] = 1-2 PM P[C | (4.20c) 

i— 0 

In Section 4.4 these expressions are evaluated for certain (important) 
situations in which the decision regions are such that the integrals can be 
easily calculated or approximated. 

Multivector Channels 

In the “diversity” communication system shown in Fig. 4.6, in which 
the transmitted vector s is applied at the input of two different channels 
and the receiver observes the output of both, it is natural to describe the 



Figure 4.6 A “diversity” vector communication system. [In many situations the 
vectors s, r t , and r 2 all have the same number of components, but this need not be so.] 

total receiver input r in terms of vectors r x and r 2 that are associated 
with each channel individually. Thus we write 

r = (r 1? r 2 ) = (r n , r 12 , • • • . ''it. r 21 . r 22 > • • • » r 2 t)> (4.21a) 

where 

fi = (r u » r n, • • • > r i fc)> (4.21b) 

r 2 “ Osi. '' 22 . • • • . r u )■ (4.21c) 

Given that vectors r x = p 1 and r 2 = p 2 are received, the a posteriori 
probability of the /th message is 

P[m ; | r = p] = P [nii | r t — Pi, — p 2 ], (4.22a) 

where p = (p x , p 2 ). With this notation, the optimum decision rule of 
Eq. 4.9 is written: set m — m k if and only if 

P[m t ] p r (p | s = s f ) = P[m t ] p tli r ,(pi, p 2 1 s = s«) (4.22b) 

is maximum for i — k. 
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The theorem of irrelevance. In many cases of practical importance a 
channel presents some data at its output which an optimum receiver can 
ignore. For instance, consider the arbitrary vector channel in Fig. 4.7, 
in which two inputs r x and r 2 are available to the receiver. Let us determine 
the conditions under which the receiver may disregard r a without affecting 
the probability of error. 

The optimum decision rule is again given by Eq. 4.22b. If we factor the 
right-hand side of this equation in accordance with Bayes rule (Eq. 2.103), 



|P[n»l)i P »T< r 2|s 

Figure 4.7 An arbitrary vector communication system described in terms of two output 
vectors. 

we see that an optimum receiver sets m = m h following the observation 
r x = p lf r 2 = pa if and only if the decision function 

PM p ti ( p, | s = sfp r fp 2 1 s = s*, r x = p x ) (4.23) 

is maximum for i — k. If r 2 when conditioned on r x is statistically 
independent of s, then for every value of p 2 

/>r 2 (p 2 .| s = s i} rx = Px) = ^(p 2 1 fx = Pi) 

= a number independent of i. (4.24) 

When this is so, the knowledge that r 2 = p 2 can never enter into the 
determination of which value of i maximizes the expression of Eq. 4.23; 
an optimum receiver may therefore totally ignore r 2 . Thus we have the 
important theorem of irrelevance: an optimum receiver may disregard a 
vector r 2 if and only if 

( 4 ' 25a > 

Equation 4.25a is a necessary and sufficient condition for ignoring r 2 . A 
sufficient condition is that 

= ( 4 - 25b > 

The meaning and utility of this theorem may be demonstrated by 
considering three examples, each of which involves two additive noise 
vectors n x and n 2 that are statistically independent of one another and of 
s. The first example, shown in Fig. 4.8, illustrates a situation in which 
Eq. 4.25b is valid: the received vector r 2 is just the noise n 2 , which is 

statistically independent of both % and s, hence of s and r x = % + s. 
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Figure 4.8 The vector r 2 is irrelevant because /’r.jr^s = P t 2 - 


Accordingly, 

P*», lr,,s = Pr 3 (4.26) 

and r 2 is irrelevant, which is obviously sensible. 

The second example, shown in Fig. 4.9, illustrates a situation in which 
Eq. 4.25a is valid but Eq. 4.25b is not. We have two vector channels in 
cascade and a receiver that has access to the intermediate output rx as well 
as to the final output r 2 . Since r 2 is a corrupted version of rx, hence 
Channel No. 1 Channel No. 2 



Figure 4.9 The vector r 3 is irrelevant because pr 2 \r liS = p H | ri . 


depends on m only through r x , we feel intuitively that r 2 can tell us nothing 
about s that is not already conveyed by t v We prove this formally by 
noting that, since r 2 = r t + n 2 , when iq is known r 2 depends only on the 
noise n 2 , which is independent of s. Thus for all p 2 and i 

Prfp 2 1 1*1 = Px, s = S t .) = p n fp 2 - Pl ) = Pl fp 2 1 rx = Px). 

The condition of Eq. 4.25a is satisfied, and the theorem of irrelevance 
states that r 2 is of no value to an optimum receiver. 

The third example, shown in Fig. 4.10, illustrates a situation in which 
r 2 cannot be discarded by an optimum receiver. We have 

A 2 (p 2 1 r x = px, s = s f ) = p r fp 2 [ n x = px - s f , s = sf 

= Pn 2 (p2 - Pi + S { | Dj = px - s i; s = s*) 

= Pnf Pa - Px + sf, 

which does depend explicitly on i. Thus Eq. 4.25 is not satisfied and r 2 
is not irrelevant, even though r 2 and s are pairwise independent. This is 
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Figure 4.10 The vector r 2 is not irrelevant. 


clearly sensible, since (as an extreme case) knowledge of r 2 provides a good 
estimate of n l5 hence of s, when p„ z is such that with high probability n 2 
is very small compared to n x . 

The theorem of reversibility . An important corollary of the theorem 
of irrelevance is the theorem of reversibility, which states that the minimum 
attainable probability of error is not affected by the introduction of a 
reversible operation at the output of a channel, as in Fig. 4.11a. As 
indicated in Fig. 4.116, an operation G is reversible if the input r 2 can be 
exactly recovered from the output r x . In such a case it is obvious that 

Prjlrj.s ~ Aaliy 

so that Eq. 4.25a is satisfied, r 2 may be discarded, and the theorem is 
proved. An alternative proof follows from noting that a receiver for r x 
can be built which first recovers r 2 , as shown in Fig. 4.11c, and then 
operates on r 2 to determine m. 



Optimum receiver for ri 
(c) 


Figure 4.11 Insertion of a reversible operation, G, between channel and receiver. 
The operation inverse to G is denoted G" 1 . For example, G might be the addition, and 
G _I the subtraction, of a fixed vector a. 
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4.3 WAVEFORM CHANNELS 

The foregoing discussion of irrelevance provides the analytic tool that 
is required in order to replace the waveform communication problem of 
Fig. 4.1 by an equivalent vector communication problem. We therefore 
return to consideration of this figure, in which the received waveform r(t) 
is given by 

r(t) = s(t) + n w (t) (4.27) 

and njt) is a zero-mean white Gaussian noise process with power density 

j\p 

S-CO-- 1 ; -cx»</<a>. (4.28) 

We first represent the signal process s{t). in an equivalent vector form and 
then show that the relevant noise process may also be represented by a 
random vector. 


Waveform Synthesis 

A convenient way to synthesize the signal set {sft)} at the transmitter of 
Fig. 4.1 is shown in Fig. 4.12. A set of N filters is used, with t he impulse 
response of the /th filter denoted by <pft). When the transmitter input is 
m if the first filter is excited by an impulse of value s iV the second filter by 
an impulse of value s i2 , and so on, with the JVth filter excited by an impulse 
of value s iN . The filter outputs are summed to yield sft). Thus the 
transmitted waveform is one of the M signals 


3,(0 = 2% (pft); i = 0, 1, . . . , M — 1. (4.29) 

7=1 



Transmitter 


Figure 4.12 Signal synthesis. The output s^t) depends on i through the choice of the 
impulse weighting coefficients {%}. 














GEOMETRIC INTERPRETATION OF SIGNALS 225 


224 OPTIMUM RECEIVER PRINCIPLES 

For ease of analysis we assume that the N “building-block” waveforms 
feCO} are orthonormal, by which we mean 

for all ] and l, 1 < j, l < N. 

We shall soon see that the error performance which can be achieved 
with signal sets generated in this way is completely independent of the 


_ IT ‘ 
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-z-t 


(b) 

Figure 4.13 Examples of orthonormal waveforms: (a) orthonormal time-translated 
pulses; ( b ) orthonormal frequency-translated pulses. 

actual waveshapes chosen for the only the coefficients {%} and 

the noise power density JTJ2 affect the minimum attainable P[6]. Thus 
the {^(0} may bT*cboseiTforengineering convenience. In application, 
one frequently encounters the set of time-translated pulses 

-Mi . ( 4 - 31a ) 



shown in Fig. 4.13 a, where g(f) is the unit energy pulse 


g(t) = 



-r < t < 0, 


elsewhere. 


(4.31b) 


A second common example is the set of frequency-translated pulses 




I /—sin 2-n -~t; 0 < t < T, . , _ 

= Vi T 7 = 1,2,,.., 


N (4.32) 


10 ; 


elsewhere, 


shown in Fig. 4.13&. It may be readily verified that both sets of waveforms 
satisfy the orthonormality condition of Eq. 4.30. [The prefix “ortho” 
comes from “orthogonal,” meaning that the integral of ipft) rp t (t) is 
zero whenever j ^ /; the suffix “normal” means that the integral is unity 
whenever j = /.] 

It may seem restrictive at first to consider only waveforms (s f (/)} that 
are constructed in accordance with Eq. 4.29. This is not so; any set of 
M finite-energy waveforms can be synthesized in this way. This and the fact 
that the number of filters required to do so never exceeds M, is proved in 
Appendix 4A. It follows that there is no loss of generality entailed in 
considering only transmitters that operate as shown in Fig. 4.12. 


Geometric Interpretation of Signals 

Once a convenient set of orthonormal functions {<p/0} has been 
adopted, each of the transmitter waveforms (s/t)} is completely determined 
by the vector of its coefficients : 

* = C >a> • 7 s iN>; / - 0, 1, .... AT — 1. (4.33) 

As usual, we visualize the M vectors {sj as defining M points in an 
A-dimensional geometric space, called the signal space, with N mutually 
perpendicular axes labeled <p x , <p 2 , . . . , <p N . If we let <p 3 denote the unit 
vector along the yth-axis, j = 1,2,...-, N, each iV-tuple in Eq. 4.33 
denotes the vector 

Si = s n <p ! + s i2 <p 2 + • • • + s iN <p N . (4.34) 

The idea of visualizing transmitter signals geometrically is of funda- 
mental importance. For example, Fig. 4.3 (which we have already 
considered) represents a two-dimensional space with three signals: N = 2, 


226 OPTIMUM RECEIVER PRINCIPLES 

V2 


Figure 4.14 Four signals in a two-dimensional signal space. Each vector s t is located 
a distance V E s from the origin. 


M = 3. As another example, consider the set of two orthonormal functions 


/—sin 2-77 f 0 t; 

9h(0= WT 

0 <t <T 

(4.35a) 

U); elsewhere 



( /~2 

/—cos 2 7rf 0 t; 

9> 2 (0= N T 

0 < t < T 

(4.35b) 


VO; elsewhere, 

where / 0 is an integral multiple of 1 jT. If we choose 

s 0 = <o,V^> 

Si — C - V E s , 0) 
s 2 = (0, -V^) 
s 3 = (V E s , 0), 


the vector diagram of Fig. 4.14 represents the set of four phase-modulated 
transmitter waveforms 


/— S cos 277)/ 0 t + -); 0 <!<T 

«,(f) = N T \ 4/ i = 0, 1, 2, 3, (4.37a) 

VO; elsewhere 


E s = s/(0 dt\ i = 0, 1, 2, 3 


(4.37b) 
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is the energy dissipated if s t (f) is a voltage across a 1-ohm load. Similarly, 
if cp-^t) and < p 2 (0 are two nonoverlapping unit pulses, the vectors of Eq. 
4.36_and the diagram of Fig. 4.14 represent the four entirely different 
waveforms shown in Fig. 4.15. The actual waveforms {.^(0} depend on 





Figure 4.15 Another set of waveforms corresponding to the vector diagram of Fig. 
4.14. 

the choice of the {<pj{t)}, but their geometric representation depends only 
on the {Sj}. 

Recovery of the Signal Vectors 

So far we have considered the synthesis of the signal waveforms {^(0} 
from corresponding signal vectors {sj. It is also straightforward to 
recover the vectors from the waveforms. We observe that by virtue of 
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Carrying out the multiplication and integration for each <p £ (0> 1 < / < N, 
we obtain 

s i = Oil’ S i2> ■ • • » *iiv)- 

The procedure can be implemented as shown in the block diagram of 
Fig. 4.16. If 5(0 is applied at the input, the output is a vector 

s = Oi, s jV ) (4.40a) 

with components 

s, = f " 5 ( 1 ) <p,(t) dt; ; = 1,2 N. (4.40b) 




Figure 4.16 Extraction of s =(s l7 s 2 , from .<0- Each of the integrations 
extends over the duration of the with which it is associated. 
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Irrelevant Data 

Now suppose that the input to the bank of N multipliers and integrators 
in Fig. 4.16 is not 5(0, but rather the received random process r(t) of 
Fig. 4.1. In this case the integrator outputs, say 


r,- = r(0 dt; j — 1, 2, . . . , N, (4.41a) 

J — CO 

are random variables! which together constitute a random vector 

r x = (r lt r 2 , . . . , r N ). (4.41b) 

Since r(0 = 5(0 + njf), we have 

r x = s + n, (4.42) 

where 

n = (n lt n 2 , . . . , n w ) (4.43a) 

is the random vector with components 


= f ««,( 0 9»*(0 dt; j = 1, 2, . . . , N. (4.43b) 
J— 00 

We assume that n w {t), hence n, is statistically independent of s. 

Were it not for the noise vector n, we have seen that r x would coincide 
with whichever one of the (s ; -) was actually transmitted. When the 
presence of n cannot be neglected, this, of course, is no longer true. 
What is true, however, is that the vector r t in and by itself does contain 
all data from r(t) that is relevant to the optimum determination of the 
transmitted message. The objective of this section is to prove this 
important fact. 

The first step in the proof is to note that the waveform equation 
corresponding to the vector equality of Eq. 4.42 is 
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The next step in the proof is to note that in terms of these random 
processes we may write 

r(t) = rj(f) + r a (0, (4.45a) 

where 


r 2 (r) = r(f) - ^(0 

= [>(/) + n JO] — [s(0 + «(01 

= ~ n(t) 


(4.45 b) 


is a random process that is independent of the signal transmitted. The 
fact that r 2 (/) is not in general identically zero implies that the noise 
process n w (t) cannot be represented with complete fidelity by the finite 
orthonormal set 

We have succeedecfm Eq. 4.45a in decomposing the received waveform 
r(t) into two waveforms, r x (t) and /-.>(/), the first entirely specified by the 
vector r x and the second independent of the transmitted signal. We now 
show that the optimum receiver may disregard r 2 (t) and therefore base 
its decision solely upon the vector v l = s + n. 

Observe that any finite set of time samples taken from r 2 (?), say 

r 2 = Oa(*i), rM , .... r g (/ 9 )), (4.46) 

depends only on n w (t). Since this is true also of n, the vectors r 2 and n 
are jointly independent of s. As a preliminary to invoking the theorem of 
irrelevance (Eq. 4.25b and Fig. 4.8), we observe in consequence that 

_ _ Pr g.n.s 

' r a |r lt s “ rr 2 [n,s — 

jT n, s 

_PwP* _ .. 


Thus r 2 may be discarded by the optimum receiver provided that it is 
also independent of n. Since a random process is completely described 
by the statistical behavior of finite sets of time samples, it follows that the 
entire process r 2 (f) may be disc arde d whenever the statistical independence. 
of r 2 and n holds true for every possible finite set of sampling instants.. 

j -Y^'l™yq7'TrCother words, the random process r 2 (t) may be 
lgnoredTf it TritafTs^al ly independent of the process n(t). 

The required proof of statistical independence rests on the fact that 
both n(t) and r 2 (?) result from linear operations — integration, addition, 
and subtraction — on the Gaussian process «,„(/). Thus h(t) and r 2 (t) are 
jointly Gaussian processes, so that by analogy with Eq. 3.130 any two 
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random vectors obtained from n(t ) and r 2 {t), respectively, are statistically 
independent if the covariance 

E[«(s) r 2 (0] - E[n(s)] E[r 2 (r)] 

vanishes for all observation instants t and s. In particular, since n w (t), 
hence n(s) and r 2 (t) as well, are zero mean, it suffices to show that 


EHs) rm — 0; for all t and s. 
From Eq. 4.44c we have 


(4.47a) 


E [»(s) r 2 (l)] = E r*(02»# 9h( s ) 
L 


= 2 E[»* ** 2 (0]> (4.47b) 

j—X 

so that we need prove only that 

n / 2 (0 = 0; for all j and t. (4.47c) 

In order to verify Eq. 4.47c, we note from the definitions of Eqs. 4.43 
and 4.45 that 

= n A n n>( 0 - »(01 = »#»«.( 0 - 


(4.47c) 


= n w (t) nja) <p,( a) da. - 2 9»*(0- (4.48a) 

J— co 1=1 

The integral can be evaluated with the help of Eq. 3.136b: 


n w (t) n M 9h(°0 dcf - = — «) <P,(a) da 


\p r<o l»p 

= ^ d(t - a) <p,( a) da - ^(0- (4.48b) 

2 J—co 2 


Evaluation of the sum follows from the fact that 


«»(“) »«(0) 9h( a ) ( p£P) da - d P 


<5(a - /3) 9?j(a) y # (0) da d/5 


vM<pM*P = -f&u- 


(4.48c) 


iV jvp 

2 9h(0 = -r 9h(0- 

i=i 2 


(4.48d) 
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Substituting Eqs. 4.48b and 4.48d into Eq. 4.48a, we have 

n/o(T) = — <Pi(t) — ^ (pj{t) = 0; for alij and t, (4.48e) 
which was to be shown. 

This completes the proof that the process r 2 (0 is statistically independent 
of n(t). We conclude that the vector defined by Eqs. 4.41 does in fact 
contain all data relevant to the optimum determination of m for . the 
communication system of Fig. 4.1. 

Joint Density Function of the Relevant Noise 

In addition to the result that r 2 (t) is irrelevant, the foregoing analysis 
yields valuable information about r x . First, Eq. 4.42 establishes that the 
relevant effect of the additive white Gaussian noise n w (t ) is to disturb the 
transmitted vector s by the addition of a random noise vector n: 

r x = s + n. (4.49a) 

Second, the discussion leading to Eq. 4.47 implies that n is a set of N 
jointly Gaussian random variables, {n^}, each of which has zero mean : 

= 0; j = 1,2, . . . , N. (4.49b) 

Third, Eq. 4.48c establishes that the {n } ) have zero covariance and equal 
variance : 

to . l = , 

2 ’ (4.49c) 

, 0 ; l j. 

Thus the joint density function p n , in the notation of Eq. 4.17, is 

(4 ' 49d) 

which implies that the (nj are statistically independent. In particular, 
we note that p a is spherically symmetric , that is, that p n ( a) depends on 
the magnitude but not on the direction of the argument vector a. 

Invariance of the Vector Channel to Choice of Orthonormal Base 

Since a receiver need never consider the process r 2 (t) of Eq. 4.45, we 
shall henceforth disregard it and designate the relevant received vector 
simply by r rather than by r x . 

Once provision -is made for calculating the vector r, the remaining 
receiver design problem is precisely the same as the vector receiver 
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problem which we have already considered in connection with Fig. 4.2 
and Eqs. 4.13 and 4.17, with the variance n 2 set equal to JW 0 /2. The 
relationship between the vector and waveform channels is illustrated in 
Fig. 4.17, in which we break both the transmitter and receiver into two 
parts. The “vector transmitter” accepts the input message m and generates 
the vector whenever m = m t ; the “modulator” then constructs s$) 
from s 4 and the waveforms which we call the orthonormal base. 

At the receiver the “detector” operates on the received waveform r(t) and 

Transmitter Optimum receiver 


Vector 

transmitter 


Waveform 

channel 


Figure 4.17 Reduction of waveform channel to vector channel. The modulator 
converts s to s(t) by the mechanism of Fig. 4.12. The detector extracts the relevant 
received vector r from r(i) by the mechanism of Fig. 4.16. 

produces the relevant vector r; the “vector receiver” then determines 
which message is most probable from observation of r and knowledge of 
the (sj and (P[w t ]}. 

We have already noted that a particular geometric configuration of the 
signal vectors {s t } may be converted to many different sets of waveforms 
(s/O} by appropriate choice of the orthonormal base. In addition, we 
now note that the derivation of p n relies only on the fact that the {<p,(0} 
are orthonormal and depends in no way on the specific waveshapes of 
these functions. Thus, as claimed earlier, .whenever their vector rep- 
resentations (sj are the same, systems with different sets of transmitter 
signals {^(0}» * = 0, 1 , . . . , M — 1, reduce to the same vector channel 
and yield the same minimum probability of error, P[S]. The expression 
for P[£] is given in Eq. 4.20, with <r 2 specialized to JVy2 in accordance 
with Eq. 4.49c. 


4.4 RECEIVER IMPLEMENTATION 

We have seen so far that the optimum receiver in Fig. 4.17 performs 
two functions: first, the receiver calculates the relevant data vector 


r = (?*!, r 2 , . . . , r N y 


(4.50a) 
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!( 


r • = r(t)<plt)dt; j = 1, 2, . . . , N. (4.50b) 


Then, in accordance with Eq. 4.19 (with <7 2 = J^ 0 /2), the receiver sets 
m = m k if the decision function 

lr — s £ | 2 — J^lnPtmJ (4.51) 

is minimum for i = k. In practice, squarers are avoided by recognizing 
that 

Ir-Sil 2 = I(^-%) 2 

1=1 

- 2(r? - 2 r iSij + s { *) = jr| 2 - 2r • s f + Is/, (4.52a) 

i=i 

in which 

r • s, = 2 r } s i} (4.52b) 

1=1 

is the dot product of the vectors r and s f . Since |r| 2 is independent of i, 
a decision rule equivalent to Eq. 4.52 is to maximize the expression 

(r • s,) + c t , ' (4.53a) 

where ' ■ 

Cf = In P[mJ - |s/), i = 0, 1, . . . , M - 1. (4.53b) 


Correlation Receiver 

When the relevant received vector r is obtained from the received 
waveform by the bank of N multipliers and integrators shown in Fig. 4.16, 
the receiver is called a correlation receiver. When M is not large, the 
numbers 

r • ^ - 2 r i s n> * = °» 1 > - • • » M ~ l 

1=1 

can be obtained from r and knowledge of the {s^} by attaching a set of 
M resistor weighting networks (with weights proportional to the {%}) to 
the integrator outputs or by other analog computer techniques. When M 
is very large, digital computation of the {r • s^} becomes preferable. A 
block diagram of an optimum correlation receiver is shown in Fig. 4.18. 


Matched Filter Receiver 

If each member of the orthonormal base {(pit)} is identically zero 
outside some finite time interval, say 0 < t < T, the use of the multipliers 
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Figure 4.18 Diagram of the correlation receiver. The bias terms {c f } are given by 
Eq. 4.53b. 

shown in Fig. 4.18 can be avoided. This is desirable, since accurate 
analog multipliers are hard to build. Consider, for instance, the output 
ult) of a linear filter with impulse response hit). When r(t) is the filter 
input, we have 

Uj(t) = f r(a) hit — a) da. (4.54a) 


If we now set 

hlt) = cplT-t), (4.54b) 

the output is 

ult) = I r(a) (pj{T — t + a) da. (4.54c) 

J — 00 

Finally, the output sampled at time t — T is 

a IT) = f r( a) <p } ( a) da 4 Tj , (4.54d) 

J — CO 

where the second equality follows from Eq. 4.50b. Thus the optimum 
decision rule of Eq. 4.53 can also be implemented by the receiver shown 


in Fig. 4.19. 

A filter whose impulse response is a delayed, time-reversed version of a 
signal (pit) is called matched to (pit ) and the optimum receiver realization 


of Fig. 4.19 is a matched filter receiver. The requirement that (pit) vanist 


.r t> T is necessary jn orderjnat^^ 

:alizable, t hat is,, in or dgrjhgfj^^ 0- 

For both the correlation and matched filter optimum receiver realizations 


we note that the “bias” terms 


c t = K'N’o In PM - N 2 ) 
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represent a priori data that are available to the receiver independent of 
the received signal r(t). In the particular case in which the bias term is 
the same for every i (in particular, when |sj 2 is constant and P[m f ] = 1 jM 
for all i), these bias terms do not affect the choice of index i that maximizes 
the decision function of Eq. 4.53 and may therefore be deleted from the 
receiver diagrams in Figs. 4.18 and 4.19 without loss of optimality. 
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Figure 4.19 Diagram of the matched-filter receiver. 

A simple example of a matched filter occurs when the signal to be 
matched is 

/ 2” 

— — cos2tt/,<; 0 

(pit) = V T (4.55a) 

0; elsewhere, 

where f } is an integral multiple of 1/2(7’. Then 
hit) ^ (plT -t) 


+ ./— cos2? t/J; 0 

= { V T (4.55b) 

0; elsewhere, 

as shown in Fig. 4.20a. 

The voltage response of the infinite-Q parallel tuned circuit shown in 
Fig. 4.206 to a unit impulse of current is 

h(t) = icos-4^ ; 0<f<co, 

C V LC 

where we have assumed that the initial energy storage in L and C at time 
t = 0 is zero. It is clear that when 1 j\l LC = 2-nf and 1/C = shjT the 
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impulse response h(t) coincides with hit) over the interval 0 < t < T, 
although it does not do so for t > T. Thus the matched filtering operation 
for (pit) can be instrumented as shown in Fig. 4.20c. The parallel switch 
closing briefly at time t = 0 dumps any residual energy in the filter, 
ensuring that signal energy received earlier than t = 0 does not contribute 
to the output at time t = T. The series switch closing briefly at t — T 
samples the filter output at the proper time. The entire cycle can be 
repeated during the interval T < t < 2T, although care must be taken to 



(b) (c) 


Figure 4.20 Integrate-and-dump filter. In application the resonant circuit may be 
lossy, so long as its time constant is much greater than T. 

be sure that the desired output is always sampled just before the filter is 
dumped. A matched filter of this sort is called an integrate-and-dump 87 
circuit. Such a filter is not time invariant, but it does give the desired 
impulse response as long as the timing of the switches is properly 
synchronized with respect to (pit). 

Parseval relationships. The vector decision function of Eq. 4.53 can be 
interpreted directly in terms of time functions by means of the following 
Parseval relationship. Consider an orthonormal set {(pit)}, j = 1,2,..., 
N, and any two waveforms defined by 

f(t)±if,<Plt) (4.56a) 

i=l 

g(t) = I gj (pit), 

3=1 


(4.56b) 
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with corresponding vector representations 

f = C/l>/2> • • • >/iv) 

g = (&i> £*»•■■» ft*)- 

Then 

f /( 0 g(0 df = f 2 2 f)Si 9h(0 dt 

J—oo J— co 3=1 1=1 

N N fro 

= 22 foil <pfc) <Pi ( 0 dt 

3=1 t=l J-oo 

jV A' N 

= 22 = IfiSi = f • g- 

3=11=1 4= 1 

Thus the well-known Parseval equation 62 from Fourier theory. 


(4.57a) 

(4.57b) 


r f(t) g (t)dt=r F(f)G*(ndf, 

— oo J— co 


where F(f ) and G(f) are the Fourier transforms of f(t) and g(t), can be 
extended to read 

I" fit) S(t) dt - P F(f) G*(f) df - f • g. (4.58a) 

J—oo J — oo 

In particular, when g(t ) = fit), we have 

f" f\t) dt = f ” |F(/)| 2 df = |f| 2 . (4.58b) 

J— CO J — CO 

Equation 4.58a states that the “correlation” of fit) and git), defined as 
the integral of their product, equals the dot product of the corresponding 
vectors. Equation 4.58b states that the “energy” of/(*), normalized to 
a one-ohm load, equals the square of the length of the corresponding 
vector f. 

Equation 4.58b provides an immediate interpretation of the bias term 
Cj- in the additive white Gaussian noise decision rule of Eq. 4.53. We have 

C-KJ^oInPW ~Ei), (4.59a) 


E t — sf(t) dt = energy of the ith signal. 

J —CO 

Moreover, from Eqs. 4.29 and 4.50 we also have 


(4.59b) 


r (0 s *(0 dt = \ r(0 2%9 , 3<0 dt 

o J—oa U=1 

A 7 fro A 7 

= 2 % KO <Pi ( 0 dt = 2 s it r t = r • 

3=1 J — CO J=1 


— co 
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Thus, in terms of the complete received waveform r(t), the optimum 
decision function of Eq. 4.53a is 

j r(t) sft) dt 4- c { . (4.60) 

J— CO 

In view of Eq. 4.60, the matched filter (or the correlation) receiver can 
be instrumented directly in terms of the {ji(0}> i — 0, l, . . . , M ~ 1, as- 
indicated in Fig. 4.21. At first glance this might appear to eliminate the 
need for the weighting and summing operations in Figs. 4.18 and 4.19. 


CO 



Sample 
at t=T 


Figure 4.21 An optimum receiver with M filters matched directly to the signals <>*(/)}, 
which are assumed to have duration T. 

Actually, of course, these operations are still being performed but now 
occur within the M matched filters (or correlators). We have already 
remarked (and prove in Appendix 4A) that the number, N, of orthonormal 
functions required to express any set of M signals {^(/)) in the form of 
Eq. 4.29 is always less than or equal to M. When M » N, a situation 


the {^(Q) . 

Signal-to-noise ratio. We may gain insight into the optimality of the 
matched filtering operation by a signal-to-noise ratio analysis, Consider 
the situation illustrated in Fig. 4.22, in which h(t) is an arbitrary linear 
filter, T is an arbitrary observation instant, and<j?(t) is any known signal. 
[In particular, we may choose (pit) to be one of the orthonormal base 
functions.] The sampled output r may be written 

r = r + n. 


often encountered in practice, it is u sually muchjess_expensive to use N 
filters (or correlators) matched to the fr^(r)j7p ^ anan alog or digital 
computer, than it Is : To us~e directly to 


(4.61) 
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r(t) = <p(t)+n w (t) 


Sample at t=T 

Figure 4.22 An arbitrary filter, the output of which is sampled at t = T. 

where r, the mean of r, depends on cp{t) and the noise term n depends on 
njt). We now show that the maximum attainable signal-to-noise power 
ratio, defined as _ 

S jX = Pjn 2 , (4.62) 


occurs when the filter is matched to (pit ) ; that is, when 

h(t) = <p(T - t). (4.63) 

In application, T is taken large enough that h(t) is realizable. 

We prove that this h(t) maximizes S/JNP by invoking the Schwarz 
inequality, one form of which states that for any pair of finite-energy 
waveforms a(t) and b(t). 


a(t) b(t ) dt 


a (t) dt 


b\t ) dt . (4.64) 


The equality obtains if and only if b(t ) = ca(t), where c is any constant. 

The validity of Eq. 4.64 is evident if we make an orthonormal expansion 
of the waveforms <z(t) and b(t ) by means of the Gram-Schmidt procedure 
discussed in Appendix 4A. We then have 

a(t) = flj v»i(0 + a 2 Waif) 

bit) = bi v»i(0 + b<i y 2 (t), 


I %(0 v#(0 dt = i’j = 2 - 

J — 00 

Figure 4.23 illustrates that the angle between the two vectors 

A g N 

a = (fli, a 2 ) 

b 4 (6,, bd 


is given by 


cos 0 = 


a(t) b(t) dt 


a\t) dt b\t)dt 


(4.65a) 


(4.65b) 


The second equality above rests on the Parseval relations of Eq. 4.58. 
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Since the substitution of c<p(T — a) for h{ a) satisfies Eq. 4.66 with the 
equality, the ratio S /Jf is indeed maximized when h(t ) is matched to cp(t), 
as claimed. 

The frequency-domain interpretation of this result is instructive. Since 
amplitude scaling affects the signal and noise in the same way, we need 
consider only c — 1. Then the transfer function of the matched filter is 
given by 


H(f ) = <p(T — t)e-' 2,rft dt 

J—co 

r 

= r <p{ a)e- i2 ’ r/(2 ’"“ ) da. 

J — CO 

‘ j 


(4.67a) 

where the signal spectrum is 

' 

<£(/) = !<!>(/)! e Wf) = (p(t)e-' 2,rfi dt. 

(4.67b) 

Thus 

H(f ) = 10(/)| e~ mf)+u/T \ 

(4.67c) 

In accordance with the inverse Fourier transform, 


V(r) = 1 " <1 

. (4-68) 


— CO 


we may interpret the filter input y(t ) to be a composite of many small 
(complex) sinusoids: the sinusoid at frequency * has amplitude [<E>(/i)| df 
and phase 0(/ x ). In passing through the filter this component is multiplied 
by //(*), which changes its magnitude to l®(/i)j 2 and its phase to 

9(/i) - Wi) + 27*71 = -27*7’. 

Thus the filter-output sinusoid at frequency * is 

|0(/i)| 2 dfd wt ~ T \ 

which has a maximum at t — T. Since this is true for every*, all of the 
frequency components of y(t) are brought into phase coincidence and 
reinforce each other at t = T; as shown in Fig. 4.24, an output signal 
peak is produced at this instant. 

Appreciation of the effect of the spectral-amplitude shaping caused by 
\H(f)\ can be gained by contrasting the matched filter with an inverse 
filter, which has the transfer function 

— y&irfT 1 

£ t (4 69) 

*(/) mn\ 



Figure 4.24 An example illustrating that the output of the matched filter is maximum 
at the instant t ~ T. 





Figure 4.25 The inverse filter has high gain at frequencies for which |<&(/)| is small, 
whereas the matched filter gain is proportional to |<B(/)|. 
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The inverse filter also brings all components of <p(t) into phase coincidence. 
As shown in Fig. 4.25, however , the weaker components of tp{i) are 
accentuated by the inverse filter, whereas they are suppressed by the" 
matched filter. S ince the noise spectrum § w Cf) is flat over all frequencies, 
the inverse filter exalts the out-of-band noise and the matched filter 
subdues it. 

Component Accuracy 

So far we have presumed that the receiver knows exactly both the 
transmitter signal vectors (sj and the orthonormal base functions {<£,-(/)}. 
In practice, of course, limitations on component accuracy render this 
knowledge only approximate. Alternatively, in the interests of economy 
we might wish to settle for a system that is somewhat less than optimum. 

In general, calculation of the precise trade-off between error per- 
formance and the precision of receiver instrumentation is both tedious 
and unrewarding. It is more instructive to visualize the nature and extent 
of the problem geometrically. For example, assume that there are two 
equally likely transmitter signals, say 

s(t) = ±Jtf>i(/). (4.70a) 

The corresponding vector representation is illustrated by the black dots 
in Fig. 4.26. The receiver’s approximations to these signals might be 

s(t) = + Ja9?a(0]- (4.70b) 


<P2 



Figure 4.26 The effect of receiver approximation. 


These approximations are represented vectorially by the open dots in the 
figure. The second orthonormal function <p 2 (0 > s introduced to permit 
complete generality in representing the receiver’s approximation of <p x (t). 
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A receiver matched to these approximate signals would employ the 
decision boundary indicated by the dotted line in Fig. 4.26, whereas an 
optimum receiver would use the <p 2 -axis as the decision boundary. It is 
clear that the degradation in error performance is small as long as the 
receiver’s approximations of the {.?/;)} are sufficiently accurate that the 
probability of the received vector r falling into the shaded area is small 
compared with the optimum P[S]. This condition is met in general 
whenever 

J — 00 

is small compared with the square of each intersignal distance, Is* — s fe | 2 , 
for all i and k i. 

4.5 PROBABILITY OF ERROR 

We have seen in Section 4.3 that the problem of communicating one 
of a set of M specified signals {^(0} over a channel disturbed only by 
additive white Gaussian noise always reduces to a corresponding vector 
communication problem. In particular, we recall that the transmitter 
signals are represented by M points {s f } in an N-dimensional space and 
that the relevant noise disturbance is represented by an A-dimensional 
random vector, n, with the spherically symmetric density function 

(4 - 71a) 

In accordance with the discussion leading to Eq. 4.19, the optimum 
receiver divides the signal space into a set of M disjoint decision regions 
{/J; any point p is assigned to I k if and only if 

I P “ s fc | 2 - N o In P[m fc ] < | p - s t \ 2 - Jf 0 In P[wJ ; for all i ^ k. 

(4.71b) 

The receiver output m is then set equal to m k whenever the received vector 

r = s -j- n (4.71c) 

lies in I k . Since the vector communication problem is invariant to the 
specific orthonormal base {<p } {t)},j = 1, 2, . . . , N, that relates the (sj 
| and the {sv(0}» the probability of error is independent of the waveshapes 

| ascribed to the 

In this section we evaluate the minimum attainable error probability 
(Eqs. 4.20, with a 2 = N 0 j 2) for certain important vector signal con- 
figurations. Except for M = 2, we assume that all M a priori probabilities 
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{P[m*]} are equal. The assumption is justified from an. operational point 
of view in the discussion of “completely symmetric signals” at the end of 
this chapter. 

Equivalent Signal Sets 

In addition to signal sets that are equivalent by virtue of the fact that 
their geometrical configurations are identical, different geometrical 
configurations may also be equivalent insofar as error probability is 
concerned. Insight into this fact is gained by considering the geometry 
of the decision regions. 

Rotation and translation of coordinates. In Fig. 4.27a we show a 
signal s* and its decision region./*. Whenever s* is transmitted, a correct 


<t>2 



Figure 4.27 Equivalent decision regions. The concentric circles represent loci of 
constant p n . 


) 
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decision results if n -j- s* falls within /*. The probability of this event is 
unaffected if s* and /* are translated together through signal space. This 
follows, in accordance with Eqs. 4.71, from the fact that the noise n is 
additive and its 1 density function p B is independent of the signal. More- 
over, since p a is spherically symmetric, as indicated by the contours of 
constant probability density in the illustration, the probability that n -1- s* 
will fall in /* is also unaffected by a rotation of J* about s*. Thus s* and /* 


Minimum-energy signals. Although the probability of a correct decision 
is invariant to translation, such a transformation does affect the energy 
required to transmit each signal : in general, s/ = s* — a implies 

Ei A P s t \i) dt - | Si ! 2 |s* - a| 2 = |s*f 4 E'i . (4.72) 

J— co 


may be simultaneously transla ted and ro tated, as in Fig. 4.276, 'without 
affecting the conditional probability of a correct decision, PfC m*]. 


When there is a constraint, say E s , on the peak energy permitted for 
any signal, the vectors {s*} are constrained to lie within a sphere of radius 
■s/E ; , as indicated in Fig. 4.2B. A somewhat weaker constraint is that the 
mean energy E m , defined as 

. M-l m- i 

E m = 2 PWS,= I P[m,l|s/, (4.73) 

i =0 1=0 

be less than some fixed value. For a given configuration of signal points 
the mean energy can be minimized, without affecting the probability of 


V>2 



Figure 4.28 Peak energy constraint. 
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error, by subtracting from each signal s £ a constant vector a so chosen 
that 

M - 1 

2 P[m £ ] |s £ - a| 2 
0 

is minimum. 

How to choose a is obvious once we have recognized that the expression 
for E m is precisely the expression for the moment of inertia around the 
origin of a system of M point masses, where the mass of the /th point is 
P[/n £ ] and its position is s £ . Since the moment of inertia is minimum when 
taken around the centroid (center of gravity) of a system, it follows that 
a should be chosen in such a way that the resulting centroid coincides 
with the origin. Given a set of probabilities (P[m £ ]} and a set of signals 
(s £ ), the appropriate choice of a is therefore 

M- 1 

a = 2 PM s £ - E[s]. (4.74a) 

1=0 

As proof, we note that for any other translation, say b, we have 

E[|s - b| 2 ] = E[|(s - a) + (a - b)| 2 } 

= E[|s - a| z ] +. 2(a - b) • (E[s] - a) + |a - b| 2 (4.74b) 
- E[|s - a| 2 ] + |a - b| 2 , 

where the last equality follows from Eq. 4.74a. The mean energy is 
increased when b^a. If the mean energy still exceeds the allowable 
maximum after the translation a is made, further reduction is possible 
only by transformations such as radial scaling that do affect the probability 
of error. 

Rectangular Signal Sets 

When the geometric configuration of M equally likely signal vectors 
is rectangular, the calculation of the error probability is especially easy. 
The simplest situation is that in which there are only two signals. 

Binary signals. The general case of two signal vectors, each with 
probability is shown in Fig. 4.29o. From the standpoint of error 
probability, an equivalent signal set is that shown in Fig. 4.296, in which 
the signal configuration has been rotated and translated in such a way 
that the centroid coincides with the origin and the vector (s 0 — s x ) lies 
along the cp x axis. 

The optimum decision regions for Fig. 4.296 are determined by the 
expression 

min (|p — s £ [ 2 — JV'o In P[m £ ]}. (4.75) 
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For equal a priori probabilities, this decision rule is just 

min |p — s £ | 2 . 

i 

It is clear from Fig. 4.296 that the locus of all points p equally distant 
from s 0 and s x is the cp 2 axis. Thus an error occurs when s-t is transmitted 



(a) 



Figure 4.29 Binary signal sets for which P[£] is the same. The signals in ( b ) are called 
“antipodal”; each has energy E s — (d/2) 2 . 


if and only if the noise component n l exceeds dj 2, where d is the distance 
between the two signals: 

P[g | mj = P[p in / 0 1 mj = P n x > d - , 

where 

d 2 = |s 0 - s^ 2 = f MO - Si(03 2 dt. 

J — cfo 

But n x is zero-mean Gaussian with variance JCJl, so that 



(4.76a) 
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Setting y — a.y/2 /JNP 0 , we have 


P[£ | m,] = 


_L e -v'/2 d y ± Q(-pL=). 

d! 2 ^!2tt V2JVV 

VjV’o/a 


Since, by symmetry, the conditional probability of error is the same for 
either signal, we also have 

P[g] = i PM P[S I mj = P[8 | mj = e {-—=) ■ (4.76b) 

The function Q{ ) was defined in Eq. 2.50 and plotted in Fig. 2.36. 

Equation 4.76b is the minimum error probability for any pair of 
equally likely signal vectors separate d by a distance d, regardless ol their 
actual locatiorPin signal space. Wh eiTthe signals have minimum energy 
and are therefore antipodal as in Fig. 4.296, the length of each vector is 
V E s , so that d = 2\] E s and 

P[£] = Q(J2EJ,N' 0 ); equally likely antipodal signals. (4.77) 

On the other hand, when the signals are orthogonal, as in Fig. 4.30, we 
have d—\j 2 E s and 

P[£] = Q(y/EJN 0 ); two equally likely orthogonal signals. (4.78) 



Figure 4.30 Two orthogonal signals. 

It is common engineering practice to express energy ratios in units of 
decibels (db), where 

E s a - A . E s 

■± =10 JOSIO TT ■ 

J'To db 
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For example 


Es 

JV’o 

E s 

^0 db 

0.1 

-10 db 

1.0 

0 db 

2.0 

3 db 

3.0 

4.8 db 

10.0 

10 db 

100.0 

20 db 


The probabilities of error for antipodal and orthogonal signaling are 
plotted in Fig. 4.31 with EJN 0 in units of db. The figure illustrates that 
antipodal signaling is 3 db more efficient than orthogonal signaling in 
communicating one of two equally likely messages. 

With binary signals, it is also easy to determine P[£] when the a priori 
probabilities are not equal. As shown in Fig. 4.32, the decision boundary 
is shifted from s x toward s 0 by an amount 

A = ; NV2 ln P[mJ ( 4 . 79 a) 

d P[m 0 ] 

Equation 4.79a is derived from the decision rule of Eq. 4.75 by solving 
the equation 

|p - Sil 2 - N 0 In PJ/wJ - Ip - s 0 | 2 - A’o In P[m 0 ] 
for p = (/>j, /> 2 ). For any value of we then have 

(ft + -J- I" PM = (ft - IJ- JC„ In PM- ■ 


JV’o In PDwj] = - - j - JV’o In PM- • 


Since A is the value of p x satisfying this equation, 
2Ad = JV’o In . 


The resulting error probability is 


P[8] = p[m ” le (7i|) + PMe (SI)' (4J9b) 

Rectangular decision regions. The ease of calculating the error proba- 
bility for binary signals is directly attributable to the fact that an error 
occurs if and only if .one random variable exceeds a given magnitude. 
A situation that is only slightly more complicated exists whenever the 
decision region boundaries are rectangular. Consider, for example, the 



Figure 4.31 Probability of error for binary antipodal and binary orthogonal signaling 
with equally likely messages. 


<P2 



Figure 4.32 Decision regions for antipodal signals with distance d and unequal a 
priori probabilities. 
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signal Si and decision region I { shown in Fig. 4.33a. After translating s f 
to the origin and rotating the configuration as shown in Fig. 4.335, we 
see immediately that s £ + n falls within I t whenever, simultaneously, 

(«i < n i < i>i) an d («2 < «2 < b 2 )- (4.80a) 

But n x and n 2 are statistically independent (cf. Eq. 4.49), and the density 



(a) (b) 

Figure 4.33 A single rectangular decision region. 


function, say p n , of each is the same: 


p„M = ‘ ’ !,x ° ; J = (4.80b) 

P[e | m,] = Pfo < < b lt a 2 < n 2 < b 2 ] 

= P[ai < n, < by] P[a 2 < n 2 < b 2 ] 


fbl rb* 

= Pn(a) da p n { a) da. 


(4.80c) 


The optimum decision boundaries are always rectangular when the 


signal vector configuration is rectangular an 


likely. A simple example is the rectangular configuration of six equally 
likely signals shown in Fig. 4.34. We have 

<V/2 rd/2 

P[e jm 0 ] = p n ( a) da p n ( a) da = (1 - p) , (4.81a) 

J — CO J— CO 

where p = Qidjyj 2JV* 0 ) is the probability of error for two signals separated 
by a distance d. From symmetry, 

P[C I m 0 ] = P[e| mj = P[e I m 8 ] = P[C I m 3 ]. 


(4.81b) 






= |- 4 ] A/i = P tt~ ) - J - J - ] 

^ Uy-> VJ^C-' •V fit ' 1 
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* I H “ t- 

Similarly, 


(4.81c) 


P[C | m 4 ] = P[e|m 5 ] = J ^p n ( a) daj^p n («) da 

= (1 - 2p)(l - p). 


P[C] = 2 P[C I mj P[mJ 

i= 0 

= |(1 - P) 2 + 1(1 - W - P>- 


(4.81d) 


U \ & 

_ 5 ^tx 

M' u'\ ls 

10 


Figure 4.34 Rectangular decision regions. 

Vertices of a hypercube. A special case of rectangular decision regions 
occurs when M = 2 N equally likely messages are located on the vertices 
of an iV-dimensional hypercube centered on the origin. This configuration 
is shown geometrically in Fig. 4.35 for N = 2 and 3. Analytically, we 

S, = <%, %, i = 0, 1, • • • . 2" - 1, (4.82a) 

where 

(+ d/2 

= | or for alii,;. (4.82b) 

(-d/2 

To evaluate the error probability, assume that the signal 

s A / i __ i - ^ (4.83) 

0_ l 2’ 2 ’ " " * ’ 2/ 

is transmitted. We first claim that no error is made if 

n i < - ; for all ; = 1, 2, . . . , N. 

2 


(4.82b) 


for all j = 1, 2, . . . , N. 


(4.84a) 
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(a) N =2 (b) N = 3 
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This follows from the fact that p is closer to than to s 0 whenever Eq. 
4.85 is satisfied, where s 3 - denotes that signal with components -\-dj2 in 
the y'th direction and —d/2 in all other directions. (Of course, p may be 
still closer to some signal other than s 3 -, but it cannot be closest to s 0 .) 

Equations 4.84d and 4.85 together imply that a correct decision is made 
if and only if Eq. 4.84a is satisfied. The probability of this event, given 
that m = m 0> is therefore 

P[e |m 0 ]=P all | J j = h 2, ■ • ■ , N 
x r d\ 

=np 

3=i L 2J 


= ~j d/2 Pn ^ da ) 

- (1 - P) lV » 

in which, 

*= e (A) (4 - 86) 

is again the probability of error for two equally likely signals separated 
by distance d. Finally, from symmetry 


P[C I mj = P[e I m 0 ] ; for all i. 


(4.87a) 


P[C] = (1 - P y\ 


(4.87b) 


In order to express this result in terms of signal energy, we again 
recognize that th e distance squared from the origin to each sign al s. ; Js the 
same. The transmitted energy is therefore independent of i, hence may be 
designated E s . From Eqs. 4.58b and 4.82b we have 


-v ^2 

\ 2 = I S J = N- = Es 

3=1 4 


(4.88a) 


(4.88b) 


P = Q 


The simple form of the result P[C] = (1 — p) N suggests that a more 
immediate derivation may exist. Indeed one does. Note that the y'th 
coordinate of the random signal s is a priori equally likely to be +dj 2 
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or —d/2, independent of all other coordinates. Moreover, the noise ty 
disturbing the y'th coordinate is independent of the noise in all other 
coordinates. Hence, by the theorem on irrelevance, a decision may be 
made on the y'th coordinate without examining any other coordinate. 
This single-coordinate decision corresponds . to the problem of binary 
signals separated by distance d, for which the probability of correct decision 
is 1 — p. Since in the original hypercube problem. a correct decision is made 
if and only if a correct decision is made on every coordinate, and since 
these decisions are independent, it follows immediately that 

P[C] = (1 - pf. (4.90) 

Orthogonal and Related Signal Sets 

Another class of equally likely signals for which the minimum attainable 
error probability is quite easy to calculate is the set of M equal-energy 
orthogonal vectors. Closely related to them are the simplex and bi- 
orthogonal signal sets. In treating these sets it is convenient to index the 
orthonormal axes {<p 3 } from j — 0 to TV — 1 rather than from j — 1 to TV, 
where TV is the dimensionality of the signal space. 

Orthogonal signals. When M equally likely and equal-energy signals 
are mutually orthogonal, so that TV = M and 

f s f (0 s k (t) dt = s r s k = E s d ik ; i, k = 0, 1, . . . , M - 1, (4.91) 
J — oo 

the optimum decision region boundaries are no longer rectangular and 
are difficult to visualize. It is easier to proceed analytically. Letting 
denote the unit vector along the yth coordinate axis and 

s, - V^cpri j = 0, 1, . . . , Af — 1, (4.92a) 

we note that the squared distance from s } - to the received vector r is 

|r - s,| 2 = |r| 2 + |s,| 2 - 2r • (V E s tp 3 ) 


= |r| 2 + E s - 2 r 3 V E„ 

(4.92b) 

where r 3 - is the yth component of r. 

When s k is transmitted, it follows that 


|r — s fc | 2 < |r — s 4 | 2 ; all i ^ k 

if and only if 

-2r k sjE s < —2r i 'jE s , 

(4.93a) 

i.e. 

r i < r k> 2 5* 

(4.93b) 


j 
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As shown in Fig. 4.36, when s 0 is transmitted we have 

r 0 = «o + V E s 
u = »<; i > 0. 


(4.94a) 

(4.94b) 


Thus 

P[e | m 0 , r 0 = a] = P[Wj < a, n 2 < a, . . . , « Jf _i < a] 

= (P[»i < a])"" 1 , (4.95a) 

in which the last equality stems from the fact that all n t are statistically: 


<pi 

* 



Figure 4.36 Three orthogonal signals. When s 0 is transmitted, a correct decision is 
made if and only if n x and n 2 are both less than a = V E, + The heavy dashed lines 
are the intersections of the decision boundaries with the planes <p 2 = 0 and q> x = 0. 


independent and identically distributed. Multiplying by 

p, 0 (a) = Pn( a ~ ^ £ *) (4.95b) 

and integrating yields, for M equally likely equal-energy signals, 


with 


P[C| m 0 ] = p n (a - y/E s ) da. 


Pn(P)d£ 


M - 1 


P»(«) = 


^ e -a 2 /JS" 0> 

0. 


(4.96a) 

(4.96b) 


; 

:] 
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From symmetry, 

P[C | m t ] = P[C | m 0 ] = P[C], (4.96c) 

so that Eq. 4.96a is also the expression for the unconditional probability 
of a correct decision. 

The integral in Eq. 4.96a cannot be simplified further but has been 
tabulated 36 as a function of M and EJJP 0 ; a plot of P[8] = 1 — P[C] 
is provided in Fig. 4.37. 

Simplex signals. A useful application of the energy minimization ideas 
discussed earlier is to M equally likely orthogonal signals. From Eqs. 

1.0 


o.i 


0.01 

Jj> 

lo 

2 

I 

k- ■ 

o 

uj 

10" 3 


10" 4 


10“ 5 


Figure 4.37 Error probability for M orthogonal signals. 
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4.74a and 4.92a the minimizing translation is 

i ilf-l M-l 

a = E[s] = - I s, = 2 Vi- (4.97a) 

M i=o M it o 

The resulting signal set 

M = {Si ” a}; i = 0, 1, . , . , M - 1 (4.97b) 

is called a simplex and is the optimum 52 (minimum P[8] set of M signals 
for use in white Gaussian noise when energy is constrained and 
for all i.) The simplex signals for M = 2, 3, and 4 are 
shown in Fig. 4.38. Since 

M—l M-l 

2 s i = 2 Si - Mu = 0, (4.98) 

i— 0 i=0 




(regular tetrahedron) 

Figure 4.38 Simplex signals. All s f are at distance V E s ( 1 — 1/M) from the origin. 
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any one of the {s/} can be expressed as a linear combination of the others. 
The M simplex signals therefore span a space of N = M — 1 dimensions. 
By virtue of the orthonormality of the {cpj, for all i, k 

Si' * s*' = (s i - a) • (s fc - a) 

= («f * s fc ) ~ a • + sj + | a ] 2 


— E s d ik — 2 — + — 
MM 



for i = k 

otherwise. 


(4.99) 


We see that each signal in a simplex has the same energy, which is reduced 
by the factor (1 — 1/A/) from that required for the orthogonal signals, 
with no change in error probability. (Translations do not effect P[£].) 
When M = 2, the saving is 3 db; for large M the saving is negligible. 

Equation 4.99 may be used as the definition of a simplex. We note 
that a set of M vectors {s/} satisfying Eq. 4.99 may be transformed to a 
set of orthogonal vectors by adding a vector V EJM to each s/, where 
is any unit vector orthogonal to all of the {s/}. 


Biorthogonal signals. The final specific signal configuration considered 
here is the biorthogonal set, illustrated for N — 2 and 3 in Fig. 4.39. 
This signal set can be obtained from an original orthogonal set of N 
signals by augmenting it with the negative of each signal. Obviously, 
for the biorthogonal set 

M = IN. (4.100) 
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We denote the additional signals by — s,-, j = 0, 1, . . . , N — 1, and 
assume each signal has energy E s . 

It is clear from Fig. 4.40 that the received message point is closer to s„ 
than to — s 0 if and only if 

r 0 > 0. (4.101a) 

Also, r is closer to s 0 than to s f if and only if 

r 0 > rp, i > 0, (4.101b) 

. and r is closer to s 0 than to —s i if and only if 



Figure 4.40 Biorthogonal signals. When s 0 is transmitted, r is closer to ±s z - than it is / 
to s 0 if and only if « 0 and n { are such that one of the two heavy dashed lines is crossed. 


It follows that the conditional probability of a correct decision for equally 
likely messages, given that s 0 is transmitted and that 

r 0 = n 0 + -Je s = a > 0, (4.102a) 

is just 

P[e | m 0 , r 0 = « > 0} 

= P[— a < < a, — a < n 2 < a, . . . , —a < n N _ i < a] 

- {P[-a < n < a]}*" 1 



Pn(P) d P 

l 


(4.102b) 



A'-l 
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The notation is that of Eq. 4.96b. Multiplying by a) = p n { a - V E s ) 
and integrating over a from 0 to co (because of the condition of Eq. 
4.102a), we obtain 

P[C | m 0 ]=J p n (a. — yj E s ) da j p n (P) d P ‘ • (4.103) 

Once again, by virtue of symmetry and the equal a priori probability of 
the {mj, Eq. 4.103 is also the expression for P[C], Noting that N — 1 = 
(M/2) — 1, we havef 

P[C] = f p n (a. — jE s ) da. 1 — 2 f p n (@) d(3 2 . (4.104) 

Jo L da 

The difference in error performance for M biorthogonal and M or- 
thogonal signals is negligible when M and EJN 0 are large, but the number 
of dimensions required is reduced by one half in the biorthogonal case. 

Completely Symmetric Signal Sets and A Priori Knowledge 

In almost all of the specific cases we have considered— in particular, 
the binary, orthogonal, simplex, biorthogonal, and vertices- of-a-hypercube 
signal sets— the error probability calculation is greatly simplified by the 
“complete symmetry” of the geometrical configuration of the {sj. By 
complete symmetry we mean that any relabeling of the signal points can 
be undone by a rotation of coordinates, translation, and/or inversion of 
axes. As a counterexample, the signals of Fig. 4.34 are not completely 
symmetric. 

Given complete symmetry, the condition 

Pf m ] = — ; for all i (4.105) 

M 

leads to congruent decision regions {/*-} and thus to a conditional proba- 
bility of correct decision that is independent of the particular signal 
transmitted: 

P[C | m { ) — a constant; for all i. (4.106a) 

If such a congruent-decision-region receiver is used with message 
probabilities (PM) that are not all the same, the resulting probability 
of correct decision is 

M—l 

P[C] = 2 PM P[C| rn t ] = Pte | m 0 ], (4.106b) 

i- 0 

which is unchanged from the equally likely message case. Thus the error 
performance of a congruent-decision-region receiver is invariant to the 

f The integral of Eq. 4.104 is tabulated and plotted in reference 36. 
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actual source statistics (Of course, if the source statistics are 'ffj 

known in advance, the probability of correct decision can be increased by 
the use of a noncongruent-decision-region receiver designed in accordance 
with Eq. 4.71b.) 

Invariance to message probabilities can be exploited by a communication 
system designer, who seldom knows in advance the exact input statistics 
of the source. If the transmitter is designed with completely symmetric 
signals and an optimum receiver is designed on the assumption that all 
messages are equally likely, Eqs. 4.106 will be satisfied and the error' 
probability of the system can be specified independent of the message i 

source to which it is connected. A receiver designed to be optimum under 
the assumption of equally likely messages is called a maximum likelihood 
receiver. (See also the discussion following Eq. 4.9.) 

Minimax receivers . The foregoing discussion provides a powerful 
argument in support of a design assumption that all a priori message 
probabilities are equal. Even more cogently, with completely symmetric 
signals this assumption leads to a receiver design that is minimax, a term 
we now define. 

For a fixed transmitter and channel, the probability of error depends 
only on the receiver and the message probabilities. For a given receiver 
(with transmitter and channel fixed) the probability of error depends 
only on the message source statistics and reaches a maximum value for 
some choice of these statistics. This maximum value of the P[8] is a |; 

useful criterion of goodness for the receiver in the absence of a priori 
knowledge of the (PK-]}: it represents a guaranteed minimum per- 
formance level beneath which the system will never operate, regardless ; 

of the statistics of the message source to which it may be connected. With., 
this criterion, the receiver with the smallest maximum P[SJ is most desirable. j 

It is called the minimax receiver . 

The argument that the maximum likelihood receiver is minimax when 
the {s,} a r e comp le tely symmetric i s very simple. First, this receiver 
yields a probability of error that is independent of the actual (P[m ; ]} j 

with which it may be used. Second, by the definition of optimum, any 
other receiver yields a greater probability of error when used with equally 
likely signals, hence must have a larger maximum. This concludes the 
proof. . i 

Union Bound on the Probability of Error 

An approximation to the P[S [ mj for any set of M equally likely 
signals (sj in white Gaussian noise is obtained by noting that an error 
occurs when s £ is transmitted if and only if the received vector r is closer 
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to at least one signal s s , k ^ i, than it is to s*. If 8 ifc is used to denote 
the event that r is closer to s k than to s* when s s - is transmitted, we have 

P[S | mj = P[8jo U 8 a U ■ ■ • U u U - • • U (4.107) 

From Eq. 2.10 the probability of a finite union of events is bounded above 
by the sum of the probabilities of the constituent events, a result made 
geometrically evident in Fig. 4.41. Thus 

M - 1 

P[6|mJ< 2P[S ifc ]. (4.108) 

fc =0 

Note that P[£ ifc ] is not in general equal to Pjm = m k j wj, because the 
latter is the probability that r = s* + n is closer to s k than to every other 



Figure 4.41 Venn diagram. It is apparent that Pj/4 uSuC]< P[/l] + P[B] + P[CJ. 

signal vector. To emphasize that P[8 fJ .) depends only on two vectors, s< 
and s fc , hereafter we write P 2 [s £ , s fc ] in place of P[S ifc ]. Equation 4.108 
then becomes 

M— 1 

P[S|m £ ]< lP 2 [s £ ,s fc ], (4.109) 

*= 0 
(*!#<) 

We next observe tha t P 2 [s £ , sj is just the probability of error for a 
system that uses the vectors s £ and s ft as signals t o com municate one of two 
equally likely messages. The bound ofEqT4Tt>9, andTtuTinterpretation 
of P 2 [s £ , s fc ], holds for channels more general than that of additive Gaussian 
noise. For the Gaussian channel, however, the expression for P 2 [s f , s fc ] 
is particularly simple; from Eq. 4.76b, we have 

= (4 - UO) 

The union bound of Eq. 4.109 is especially useful when the signal set 
(sj is completely symmetric, for in this case the unconditioned error 
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probability P[S] equals P[8 | mj and most of the terms {P 2 [s i5 s fc ]} are 
identical. The following examples illustrate the application of the bound. 

Orthogonal Signals : 

P[8] = P[£ | m t ] < {M — 1 )Q(-Jej¥ 0 ). (4.111). 

Biorthogonal Signals: 

P[8] = P[£ | m t ] < (M - 2)Q(v / EjJf 0 ) + Q^2EjJf (i ). (4.112) 

In many instances the union bound is a useful approximation to the actual 
P[6], It becomes increasingly tight for fixed M as EjJf 0 is increased. 


APPENDIX 4A ORTHONORMAL EXPANSIONS 
AND VECTOR REPRESENTATIONS 

When one of M signals {^(0} is communicated over an additive white 
Gaussian noise channel, the vector receiver to which the optimum wave- 
form receiver reduces does not depend on the specific waveshapes of the 
N orthonormal base functions {<ft(0}- Only the vectors {s e } are important; 
the particular set {<ft(0} used to generate the signals {j/f)} has no effect 
on the decision rule (Eq. 4.53), hence on the receiver error probability. 
In the design of communication systems for use in white Gaussian noise, 
the problem is to choose a good set of vectors (s f ) and a convenient set of 
functions {(£,(/)} that will propagate satisfactorily over the channel. 

To prove that the transmitter structure of Fig. 4.12 and the correlation 
and matched filter receivers of Figs. 4.18 and 4.19 are completely general, 
we must show that any set of M finite-energy waveforms can always 
be expressed as 

s<(0 i = 0, 1, . . . , M — 1, (4 A. la) 

i=l 

in which the waveforms {<p,(0} are an appropriately chosen set of ortho- 
normal functions : 

f <Pi ( 0 ?>#(0 dt = 1 < /, j < N. (4 A. lb) 


In this appendix we prove the generality of Eq. 4A.1 and discuss some of 
its implications. 

The Gram-Schmidt orthogonalization procedure. One convenient way 
in which an appropriate orthonormal set {^(?)} can be obtained from any 
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given signal set {$<(*)} is by the Gram-Schmidt 43 orthogonalization 
procedure described in the following sequence of steps. 

1. First consider s 0 (t). If s 0 (t) = 0 (has zero energy), renumber the 


signals. 

For s 0 (t) ^ 0, set 


where 


(4 A. 2a) 


Eo= dt. 

J — co 

(4A.2b) 


Then q> x (t) is a waveform with unit energy. Since s 0 (t) ~ \J E a the 

coefficient = \J E 0 . The associated vector s 0 is shown in Fig. 4A.lc. 



(d) 


Figure 4A.1 Vectors obtained by the Gram-Schmidt procedure: M — 4, N = 3. 
Here s 2 can be expressed as a linear combination of p x and <p z , so that 0 2 (t) = 0. 
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2. Second, define the auxiliary function 0 1 (r) as 
0i (0 = s i(0 - s n 9*i(0» 


If 6$) ^ 0, set 


s u = s i(0 <Pi(0 dt. 


9*( 0 = 




£ e 4 


Then 9? a (r) also has unit energy, and j ls = V £ 0j . Furthermore, 
f 9? 2 (0 9» x (<) dt = 0, 

J— CO 


(4A.3a) 


(4 A. 3b) 


(4 A. 3c) 


(4A.3d) 


(4A.3e) 


which follows from the equations 

V^Ti j 9>a(0 9>i(0 dt — I ®i(0 9*1(0 

- f I«i(0 - Six 9*i(0]9»i(0 dt 

J — oo 

= I S X (0 9*i(0 d * - Su J 9*i a (0 dt 

J — co v — cO 

= Sn — Six = o. 

The vector is shown in Fig. 4A.16 under the assumption that 6 t (t) ^ 0. 
If 0j(O = 0, proceed to (3). 

3. The general step in the procedure is as follows. Assume that (/ — 1) 
orthonormal waveforms <pz(t), . . . , <Pi-i(t) have been defined 

through the use of s 0 (t), s^t ), . . . , s fc __x(0< If is clear that (/—!)< k, 
since each new signal introduces at most one new orthonormal function. 
Now consider s k (t ) and define the auxiliary function 


0*(O = s fc ( 0 - 2% 9>#(0» 

5=1 


(4A.4a) 


s w = s k (t) (pj(t) dt; j- 1, 2, 1. (4 A. 4b) 


ORTHONORMAL EXPANSIONS AND VECTOR REPRESENTATIONS 269 


9 5 «(0 = 


W 


If 0 fc (O ^ 0, set 


E Sk = e*(t)dt. 

J—co 

Clearly, q) t (t) has unit energy, and s kl = V E e . Also, 


(4A.4c) 


(4A.4d) 


<Pi(t) dt = 0; for 1 < m < Z — 1, (4A.4e) 


which follows from the equations 


sjEe k <Pk(t) <Pm ( 0 dt = 0 fc (O 9»«(0 dt 


= f [s*(0 - 2 9><(0l 9*«(0 

J-co L 5=1 J 

/•co J-l pco 

= S*(0 9V(0 * - 2 s « 9>i(0 9*«(0 dt 

J—co 5=1 v —co 


l-l 

'■‘km 2 Ski ^ 5m 

5=1 

'Sfem 


1 < m < l - 1. 


The foregoing procedure can be continued until all M signals {$*(/)} 
have been exhausted, as shown in Figs. 4A.lc, d. There will then have 
been established N < M orthonormal waveforms {<?,-(/)} with the equality 
holding if and only if all M signals are linearly independent — that is, if 
and only if no one signal can be expressed as a linear combination of the 
others. The integer N is called the dimensionality of the signal space 
defined by the {?//)}. By the nature of the construction, it is clear that 
each Si(t), i = 0, 1, . . . , M — I, can indeed be expressed as a linear 
combination of the {%(?)} and thus that Eq. 4A.1 is satisfied. 

A simple example of the Gram-Schmidt procedure is provided by the 
four waveforms shown in Fig. 4A.2. Starting with s 0 (t), we have 

£ 0 = 4 + 4 + 4 = 12, 


9,1(0 V12’ 


= V12- 


and 


Figure 4A.2 An illustration of the Gram-Schmidt procedure. 
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Introducing s z (t), we obtain 

5 2i — f s a(0 flhXO dt = ^ /3 , 


s 2 2 — 5 2 (0 dt — y/2, 

%) — CO 

0 2 (O = s 2 (f) - V3 9h(0 + V 5 <? 2 (0 = 0. 

Finally, introducing ,s 3 (/), we have 


S31 — $3(0 0>i(O dt — V 3> 

J — CO 

s 32 = I s 3 (f) 9 5 2 (0 dt = 2^/2, 

•/— 00 

0,(0 = 3,(0 + V3 9>i(0.+ V 5 <M0 s 0. 

Thus the four signals &(()} span a space of two dimensions, and the 
vector representations are 


s 0 (t) = V 12 s 0 - (Vl2, 0) , 

5 i(0 “ — V3 <pi(t) + Vs y 2 (0 s i ” (-V3 , Vs), 

j,(o = + V 3 <pi(t) — V 2 y,(0 s 2 — (V3, — V 2 ) , 

j 3 (0 — — V3 ^i (0 - Vs <p4f) s 3 = C-V3, “Vs), 


(4 A. 5) 


as shown in Fig. 4A.3. 


<PZ 



Figure 4A.3 A vector representation of the fo(r)} of Fig. 4A.2. 
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<P2 



Figure 4A.4 An alternative vector diagram for the {s<(f)} of Fig. 4A.2. 




We have shown that .it is always possible to represent a finite set of 
signals {^(0} by means of at least one finite weighted sum of orthonormal 
functions {^(0} and therefore that the derivation of the optimum 
receivers of Figs. 4.18 and 4.19 is always valid. 

Note that any given set (j//)} can be expanded in many different 
orthonormal sets, all of which ultimately yield the same receiver, hence 
the same decisions and the same probability of error. For example, if the 
Gram-Schmidt procedure for the signals of Fig. 4A.2 were carried out by 
considering signals in the order s^t), s 2 (t), s s (l), s 0 ((), a different pair of 
orthonormal functions (p/0), <p 2 (t), and a different set of coefficients 
{•?,•/} would have been obtained. In particular, s x would lie on the (pr- 
axis and s 2 would have a positive projection on the <p 2 '-axis, as shown in 
Fig. 4A.4. - Alternatively, a set {<p"(t)} might be obtained without use of 
the Gram-Schmidt procedure, although the resulting number of functions 
might be larger than the dimensionality, N. Such a set is shown in Fig. 
4A.5a and the corresponding vectors in Fig. 4A.5 b. Note that the four 
signal points remain coplanar and have the same relative positions. The 
important fact is that the signal points {s*} always retain the same geo- 
metrical configuration, regardless of the particular set of coordinates in 
terms of which they are described. 



PROBLEMS 

4.1 The random variable n in Fig. P4.1« is Gaussian, with zero mean. If one of 
two equally likely messages is transmitted, using the signals of Fig. P4.1 b, an 
optimum receiver yields P[E] = 0.01. 



(a) (b) 


So 

X 

Si 

V 

S2 

v”~ : 


so 

si 

S2 

S3 

-4 

0 

+4 


-4 

A 

0 

+ 4 

+8 


<*) (d) 


Figure P4.1 


Figure 4A.5 A third vector representation of the signals of Fig. 4A.2 
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a. What is the minimum attainable probability of error, P[$]mm, when the 
channel of Fig. P4.1 a is used with three equally likely messages and the signals 
of (c)? With four equally likely messages and the signals of (<0? 

b. How do the answers to part (a) change if it is known that n = 1 rather 
than 0? 

4 . 2 } One of four equally likely messages is to be communicated over a vector 
channel which adds a (different) statistically independent zero-mean Gaussian 
random variable with variance JTJ2 to each transmitted vector component. 
Assume that the transmitter uses the signal vectors shown in Fig. P4.2 and express 
the P[S] produced by an optimum receiver in terms of the function (2(a). 


<42 



Figure P4.2 


4.3 It is known that P[q m m = q when the two signal vectors s 0 and s x shown 
in Fig. P4.3« are transmitted with equal probability over a channel disturbed by 
additive white Gaussian noise. Compute P[S] mm in terms ot q, 6, and / when the 
nine vectors indicated by x ’s in Fig. P4.3 b are used as signals with equal prob- 
ability over the same channel. 


<42 



(a) W 

Figure P4.3 
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4.4 One of 16 equally likely messages is to be communicated over an additive 
Gaussian noise channel with S n (f) = XJ2. The transmitter utilizes a signal 
set {^(0) whose vector representation is indicated by x’s in Fig. P4.4. 

a. Draw the optimum decision regions. 

b. Determine P[£jmm in terms of Q( a). 

c. Find a set of 16 two-dimensional signal vectors (not necessarily optimum) 
such that the transmitted energy is never greater than E s but for which the 
attainable P[£] is less than the answer to part (b). 


<42 



d d d 


Figure P4.4 

4.5 One of the two signals s 0 = -1, = +1 is transmitted over the channel 

shown in Fig. P4.5n. The two noise random variables « x and n 2 are statistically 
independent of the transmitted signal and of each other. Their density functions 
are 

/>*„(“) =/>n,(«) = £* _!a| - 

a. Prove that the optimum decision regions for equally likely messages are 
as shown in Fig. P4.56. Hint. Use geometric reasoning and the fact that 
\ Pl - 1| + |p 2 - 1 1 = a + b, as shown on the next page in Fig. P4.5d. 


r 2 



Figure P4.5 
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(c) (d) 

Figure P4.5 ( Continued) 


b. A receiver decides that was transmitted if and only if + r 2 ) > 0. Is 
this receiver optimum for equally likely messages? What is its probability of 
error? 

c. Prove that the optimum decision regions are modified as indicated in 
Fig. P4.5c when P[jJ > h 

d. The channel may be discarded without affecting P[S] min if PfrJ > q. 
Evaluate q. 



(b) 

Figure P4.6 



4.6 In the communication system diagrammed in Fig. P4 .6a, the transmitted 
signal ^ and the noises n t and n 2 are all random voltages and all statistically 
independent. Assume that 

P[/n 0 ] = P[«h] = h 

Si - -s 0 = V E s > 


/>«,(“) = f n 2 ( a ) = ~7zF~ e “ 2/2ff2 - 
V 2tt o 


a. Show that the optimum receiver can be realized as diagrammed in Fig. 
P4.6 b, where a is an appropriately chosen constant. 

b. What is the optimum value of al 

c. What is the optimum threshold setting? 

d. Express the resulting PjT] in terms of Q( a). 

e. By what factor would E s have to be increased to yield this same probability 
of error if the receiver were restricted to observing only r v 

4.7 The voltage waveforms x(t) and y(t), plotted below, have the properties 
that when applied across a 1-ohm resistor 
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s. 

These signals can be used to communicate one of two equally likely messages 
over a channel perturbed by additive white Gaussian noise with power density 
of 4 watts/cycle/sec (on a bilateral frequency scale). 

a. Calculate the minimum attainable probability of error when the two 
signals used are *(0 and 

b. Calculate the minimum attainable probability of error when the two 
signals used are -'</) and y(t). 

4.8 a. Calculate P[£] m in when the signal sets specified by Figs. P4.Sa, b, and c 
are used to communicate one of two equally likely messages over a channel 
disturbed by additive Gaussian noise with S „(/) = 0.15, 
b. Repeat part (a) for a priori message probabilities (j, |). 


soft) »iW 





Figure P4.8 



4.9 Express P[8]nun in terms of Q(«) when the signal set shown in Fig. P4.9 is 
used to communicate one of eight equally likely messages over a channel dis- 
turbed by additive Gaussian noise with S n (f) = JN n 0 , /2. 
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Figure P4.9 


4.10 One of two equally likely messages is to be transmitted over an additive 
white Gaussian noise channel with 8 n (f) = 0.05 by means of binary pulse 
position modulation. Specifically, 

s 0 0) = pi 0, 

Si(t) = pi* ~ 2), 

in which the pulse p{t) is shown in Fig. P4.10. 

a. What mathematical operations are performed by the optimum receiver? 

b. What is the resulting probability of error? 

c. Indicate two methods of implementing the receiver, each of which uses a 
single linear, filter followed by a sampler and comparison device. Method 1 
requires that two samples from the filter output be fed into the comparison 
device. Method II requires that just one sample be used. For each method 


P(t) 



Figure P4.10 
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sketch the impulse response of the appropriate filter and its response to p(t). 
Which of these methods is most easily extended to M - ary pulse position modu- 
lation, where s t (t) s= p{t — 2/ ), / = 0, 1, . . . , M — 1 ? 

d. Suggest another pair of waveforms that require the same energy as the 
binary pulse-position waveforms and yield the same error probability; yield a 
lower error probability. 

e. Calculate the minimum attainable probability of error if 

s Q (t) =p(0 and J i( f ) =/>(* - !)• 

Repeat for 

s 0 (t) = p(0 and *i(0 = ~p( ( ~ *)• 

4.11 One of two equally likely messages, m 0 or m lt is to be transmitted over an 
additive white Gaussian noise channel by means of the two signals 

0<,<T 
0; elsewhere, 

JiW = (Vt cos2 ^ + A) ' : 0<leiT 

y 0; elsewhere, 

where 7 = 2 msec,/; = 1 Me, and A = 250 cps. The noise has power density 
spectrum XJ2. If EjX 0 = 6, calculate the probability of error to two signifi- 
cant digits. Repeat for A = 500 cps. 

4.12 M signals s 0 (t), Sl (t), s M ^t) exist for 0 < t < 7, but each is identical 
to all others in the subinterval [/ ls t 2 ], where 0 < t 1 < t 2 < T. 

a. Show that the optimum receiver may ignore this subinterval. Equivalently, 
show that if s 0 , s 1; . . . , s M _! all have the same projection in one dimension, then 
this dimension may be ignored. Assume an additive white Gaussian noise 
channel. 

b. Does this result necessarily hold, true if the noise is Gaussian but not white . 
Explain. 

4.13 Consider the multipath communication model shown in Fig. P4.13o, for 

which P[m 0 ] = b Assume that the three paths are characterized by the following 


parameters : 

Constant attenuation 

oq = 0.2 

tf 2 

II 

o 

a 3 = 0.6. 

Constant delay 

= 1 msec 

r 2 =■ 1 .5 msec 

— 2 msec. 


White noise power density S Wj (/) = 0.002 &nSf) ~ 0-006 0.004, 
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The three noise processes are Gaussian and statistically independent of each 
other and the signal transmitted. The transmitter is defined by the mapping 

(5 cos 27rl0 3 r; 0 < t < 3 x 10 -3 

m =m 0 o s(t) = s 0 (t) = 

10; elsewhere. 

m = s{t) — 



at t = Ti 
(b) 



Figure P4.I3 


a. Show that the optimum receiver can be realized in the form illustrated in 
Fig. P4.136. Determine h t (t), 7 X , and the specification of the decision device. 
Suggest a reasonable implementation for h^t). Calculate P[S] to two significant 
digits. 
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b. Now assume that the receiver has access to the three multipath outputs 
individually. Demonstrate that in this case (called diversity reception) the 
optimum receiver can be realized in the form shown in Fig. P4.13c, in which the 
A’s are constant delays and the a’s are constant multipliers. Determine the A s, 
the a' s, h 2 (t), T 2 , and the specification of the decision device. Calculate the 
probability of error to two significant digits. 

4.14 Specify a matched filter for each of the signals shown in Fig. P4.14 and 
sketch each filter output as a function of time when the signal matched to it is 
the input. Sketch the output of the filter matched to s 2 (t) when the input is 

*i(0. 


SJ.(<_) 



are used with equal probability over an additive white Gaussian noise channel. 
The receiver bases its decision solely on observation of the received process 


r(t) = s(t) + n m {t) 

over the restricted interval 0 < t < 2. Express the minimum attainable P[S] in 
terms of Q( a). Contrast numerically with the performance of an optimum 
receiver that observes all of r(i), — < t < 00 ■ 
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4.16 A transmitter uses the signals {^-(0) to communicate one of M > 2 
equally likely messages over an additive white Gaussian noise channel with 
power density J\ n 0 /2, where for i = 0, 1 , . . . , M — 1 


*<(0 = 


— r cos I 2 it -t + 


t Mr 


0 < / < T, k an integer 


elsewhere. 


a. Sketch the signal vectors and optimum decision regions for M = 5. 

b. Use geometric arguments to show that the minimum attainable P[S] is 
bounded by 

p < P[S] < 2 p, 

where 


p = Q 


2 E s . 37 
sm M 


[This very neat result is due to E. Arthurs and H. Dym. 4 ] 

4.17 Assume that a set { 8 £ > of M vectors satisfies the equations 


0 , • 0 , = 


a. Prove that 1 >p > — 1/(M — 1), where the right-hand equality is satisfied 
by the unit-energy simplex. Hint. Consider 

Af— 1 2 

20 , - 

b. Prove for any allowable p that the signal set {s*}, with s i = V E p B i for all i, 
has the same error probability as the simplex signal set with energy 


' -E (1 "A 


hence the same error probability as the orthogonal signal set with energy 

E 0 =E p (l-p). 

j M-l 

Hint. Consider the set {(s* — a)}, with a = — 2 s r 

M ;=o 

4.18 Either of the two signal waveform sets illustrated in the Fig. P4. 18 may be 
used to communicate one of four equally likely messages over an additive white 
Gaussian noise channel, 
a. Show that both sets use the same energy. 



5 


Efficient Signaling 
for Message Sequences 


Preceding chapters have dealt with the problem of communicating a 
single input message chosen at random from some finite set of possible 
inputs. In practice, however, we are not often interested in communication 
systems that transmit only a single message and then cease operation 
forever, but rather in systems that communicate a sequence of messages, 
one after another, for many years. 

Of course, we might choose to consider the transmission of a sequence 
of K inputs, each chosen from a set of M possible messages, as the trans- 
mission of a single input chosen from a set of M K possibilities. This is 
the single-transmission, or “one-shot,” approach. Alternatively, we can 
reformulate the single-transmission theory considered thus far in such a 
way that the sequential nature of the communication problem will be 
explicitly reflected in our analysis. In doing so we shall gain rich dividends, 
the concepts of channel capacity and communication efficiency. We shall 
also gain insight into the interrelationships between time, bandwidth, prob- 
ability of error, and signal-to-noise ratio. In this chapter we consider these 
issues from a theoretical point of view. In the next we discuss certain 
aspects of the problem of system implementation. 

5.1 SEQUENTIAL SOURCES 

Given a message source that produces a sequence of discrete symbols, 
we are interested in characterizing how much transmission capability is 
required to communicate the source output to a distant terminal. In the 
simplest case we might have a source that produces statistically inde- 
pendent binary digits, each of which is equally likely to be 0 or 1, at a 
uniform rate of R digits/sec. During any time interval Fthat is an integral 
multiple of 1 /R, this source generates a sequence of RT binary digits, and 
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each of the 2 RT possible sequences is equally likely to occur. For example, 
if R = | and T - 2, one of the eight sequences 

000 010 100 110 
001 oil 101 in 

is produced, and each has a priori probability Thus the transmitter 
must be able to communicate one of 

M = 2 rt (5- 1 ) 

equally likely messages during each successive T - sec interval. 

Source Rate 

For the situation just considered, we call R the source rate, 
measured in units of binary digto (abbreviated bits) per second. Simi- 
larly, for other sources, not necessarily binary, that produce one ot a set 
of M equally likely messages in any time interval T we define the source 
rate in such a way that Eq. 5.1 remains valid. 

R = — log 2 M bits/sec. ( 5 - 2 ) 

T 

As an example of the application of this definition, consider a source 
that generates one symbol selected from an E-symbol alphabet each 
1 IR' sec If the symbols are equally probable and successive selections are 
statistically independent, in time Fthe source effectively specifies one of 

M = L r ' t (5.3a) 

equally likely messages. The source rate is therefore 

R=- log, M = R' log 2 L bits/sec. (5.3b) 

T 

To see that the rate of a source is a meaningful measure of the trans- 
mission capability required to communicate the source output, we need 
only recognize that a set of M messages can be converted into a set ol 
binary sequences simply by numbering the original messages and writing 
these numbers in binary form. For example, we might have 


Message Message No. S equence 



000 

001 

010 

011 


Message Message No. Sequence 


e 4 100 

f 5 101 

K 6 NO 

h 7 Nl 
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The identity of any input message can be specified by communicating the 
associated binary sequence. When M is a power of 2 and each message is 
equally likely, successive binary digits obtained in this way are statistically 
independent and equally likely to be 0 or 1. In this text we restrict our 
attention to the problem of communicating such a binary sequence. It 
can be shown 27 that the restriction entails no significant loss of generality. 

The deep significance of source rate (which is frequently called “infor- 
mation rate”) is clarified by the following considerations. Assume that we 
have two independent sources, the first of which produces one of M x and 
the second of which produces one of M 2 equally likely messages during 
each interval of T sec. If each source is connected to a separate trans- 
mitter, the required transmission capabilities are, respectively, 

Ri — ~ log Mi bits/sec. (5.4a) 

and 

R 2 = ~ M 2 bits/sec. (5.4b) 

On the other hand, if both sources are connected simultaneously to a 
single transmitter, it must be able to specify one of M — M L M 2 messages 
in time T , hence must accommodate a rate of 

R — ~ tog M = — log M x M 2 

= ^ log M 1 -1- ^ log M 2 — Ri 4- R a bits/sec. (5.4c) 

The important point is that, by virtue of the logarithm in the definition of 
rate, the rate of the two sources combined is the sum of their individual 
rates. 

The utility of a communication system is measured by the (maximum) 
source rate that it will accommodate: other things being equal, one 
system with fate R can handle as much traffic as two systems with rate 
R/2. In contrast, note that a system capable of transmitting one of M 
equally likely messages per unit time is not equivalent to two such systems, 
each of capability M/2. 

Transmitter Power 

In Chapter 4, which dealt with the transmission of a single message, 
we considered the selection of signals subject to a constraint on the trans- 
mitted energy, E s . We are now concerned with the transmission of a 
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(possibly unending) sequence of messages, so that an energy constraint is 
no longer meaningful. But it is both meaningful and instructive to impose 
a bound on the average transmitted power, denoted P s . For a signal s(t) 
of duration 7" the average power is defined by 

(5 - 5) 

Thus, a constraint on P s implies that the available transmitter energy 
increases linearly with time. 

If a source has rate R, it can be thought of as producing one binary 
digit each IjR sec. Subject to an average power constraint P s , the average 
energy available per bit, say E b , is therefore 

E b = — joules/bit. (5-6) 

R 

The average energy per bit required by different communication systems to 
obtain a given standard of error performance is a measure of their 
relative efficiencies. 

5.2 BIT-BY-BIT AND BLOCK-ORTHOGONAL SIGNALING 

To see that different communication systems may yield drastically 
different performances for the same value of E b , let us contrast the results 
achieved when a sequence of 

K=RT (5.7) 

equally likely binary digits is communicated by two specific signaling 
schemes. The first (a rather obvious choice) transmits a signal consisting 
of a sequence of K nonoverlapping pulse translates, each of which has 
the same waveshape but is positive when the corresponding bit in the input 
sequence is 1 and negative when it is 0, as shown in Fig. 5.1. The energy of 
each elementary pulse is E b , and the total energy expended is KE b . The 
second signaling scheme uses a signal set of 2 K orthogonal pulses, each 
having energy E„ = KE b . The choice of transmitted signal is made by 
observing the entire input sequence at once and transmitting the fth pulse 
when the binary number specified by this sequence is i. 

In many applications the entire K bit sequence must be transmitted 
correctly. A naval fire-control system, in which a 1 for theyth digit could 
designate that the target is above the surface and a 0 that it is below, is an 
example. In such cases the sequence is considered to be communicated 
correctly if and only if every one of its K bits is reproduced without error at 
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we associate the M= 2 E possible signals with the 2 K vertices of a 
^-dimensional hypercube. In Chapter 4 we noted (Eq. 4.87) that the 
probability of at least one error with such a signal set is 

P[£] = 1 - (1 - p) K = 1 - (1 - P) RT , (5.10a) 

in which 



is the probability of error for a binary decision between two antipodal 
signals of energy E b in additive white Gaussian noise with power density 
JPJ2. Since it was also pointed out in Chapter 4 that the optimum 
receiver in this case can decide on each bit independently of every other, 
we characterize this signaling scheme as “bit-by-bit” transmission. 

For any choice of R and P s Eqs. 5.10 state that the probability of error 
tends to 1 as T, hence K, becomes large. For fixed T and J\P 0 the proba- 
bility of error can be made small only by increasing the energy expended 
per bit, E b , either by increasing the average power P s or by decreasing the 
rate R. These results are intuitively agreeable; indeed, for many years 
communicators assumed that decreased error probability could be achieved 
only by increasing power or decreasing rate. 

Block-Orthogonal Signaling 

To see that this assumption is false, we need only consider the second of 
our examples, in which one out of 2 K orthogonal pulses is transmitted 
every T sec. For the particular example of the discrete pulse-position- 
modulated (abbreviated PPM) orthogonal signal set illustrated in Fig. 5.2, 
the transmitted signal can be written 

s i(t) = V E s y(t — itj); i = 0, 1, . . . , 2 K — 1, (5.11a) 

where i is the binary number specified by the X-bit input sequence and 
(p(t) is a unit-energy pulse of duration 

Tl = Z. (5.11b) 

We have seen in Chapter 4 (Eq. 4,111) that for any set of M equally 
likely equal-energy orthogonal signals the probability of error is bounded 
by 


P[g]<(M — 1)2| 
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Energy = E s » TP S 


24n T = 32ri 


f <p*(t) dt - 1 n = 7/ 32 

Jo K= 5 

Figure 5.2 Block orthogonal waveform for messsage sequence 11010. 


with the second inequality following from Eq. 2.122. A bound that is 
sometimes tighter is derived in Section 5.6, but Eq. 5.12 suffices to provide 
insight into the behavior of P[S]. 

By substituting 

M = 2 K = 2 rt (5.13a) 

and 

E s = KE b = TP S , (5.13b) 

Eq. 5.12 can be rewritten in the form 

P[8] < 2 RT e~ TP * l2jr ° = exp \- - R In 2)] . (5.14a) 

L \2J'To ' - 

We see that the probability of error approaches zero exponentially with 
increasing T, as long as the rate R satisfies the bound 


PI P 

R < — & 0.72 . 

2X n In 2 JNP n 


(5.14b) 


Expressions equivalent to Eqs. 5.14 are 

P[£] < 2 K e- K( ^ /2jfo) = exp - k(£^ - In 2) (5.15a) 


~ > 2 In 2 1.39. 


(5.15b) 


The contrast between the results obtained with bit-by-bit transmission 
(Eqs. 5.10) and those obtained when orthogonal signals are used to trans- 
mit a whole block of K input bits simultaneously (“block-orthogonal 
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signaling”) is dramatic. In the first case increasing K forces the proba- 
bility of error toward unity regardless of how large we make the energy 
ratio per bit, EJXq. In the second, by increasing K we can force the 
probability of error to be as close to zero as we wish, provided that EJJC „ 
exceeds 1.39. An alternative statement is that the signal-to-noise power 
ratio PJN 0 implies a bound on the maximum rate of communication; 
at rates below this maximum the P[S] can be made as small as we wish by 
choosing T sufficiently large. 

Geometric Interpretation 

The geometry of the signal-vector constellations for the two signaling 
schemes just considered provides insight into the contrast between their 
performances. As shown in Fig. 5.3 for bit-by-bit signaling, the distance 
between nearest neighbors remains fixed as K increases, whereas the num- 
ber of nearest neighbors and the number of dimensions occupied by the 
signal set increase linearly with K. The probability that at least one of the 



(c)K= 3 


Figure 5.3 Bit-by-bit signal geometry. 
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K relevant noise components will carry the received signal vector closer to 
a neighbor than to the transmitted vector becomes large as K increases ; 
there are K chances for this to happen. 

On the other hand, in the block-orthogonal case the distance between 
nearest neighbors grows linearly with \J K, as indicated by Fig. 5.4. 
When K increases from j — 1 to j, this growth in distance is achieved by 
introducing a new dimension for each of the 2 ,_1 additional signals and 
rescaling in amplitude. Even though the number of nea rest-neighbors 
grows as 2 K (all signals are nearest neighbors), the growth in the distance 


Figure 5.4 Block-orthogonal signal geometry. The geometry obtains for each pair of 
signals (s„ Sj); 0 < i,j < 2* ~ 1 , j # i. 

between signals dominates the probability-of-error behavior for large 
values of EJJP 0 . Conversely, for small EJN 0 we shall see that the growth 
in number of neighbors dominates and that P[8] — *■ 1 as K becomes large. 

5.3 TIME, BANDWIDTH, AND DIMENSIONALITY 

It might seem that the block-orthogonal PPM signaling scheme pro- 
vides a solution to the general problem of accurate, efficient communi- 
cation over a Gaussian channel. Unfortunately, such is not the case: for 
R close to OJ2PjJf 0 , a very large value of T is required to obtain a 
large negative exponent in the bound of Eq. 5.14a; however, very large T 
in turn implies that the number 2 liT of orthogonal waveforms required 
in the signal set is enormous. We shall see that a channel with a given 
finite bandwidth cannot accommodate 2 RT orthogonal waveforms as T 
increases while R is fixed. All physical channels are characterized by a 
finite-bandwidth constraint, hence no block-orthogonal signaling scheme 
can be built for fixed rate and arbitrarily large T. 
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Signal Dimensionality as a Function of T 

A measure of the constraint imposed by finite bandwidth on the 
dimensionality of a signal set can be gained from theorems due to Shannon 
and to Landau and Poliak, which we state without proof.f 

Dimensionality theorem. 

Let {cpj{t)} denote any set of orthogonal waveforms of duration T and 

“bandwidth ” W. More precisely, require that each (pft) 

(1) be identically zero outside a time interval of duration T and 

(2) have no more than h of its energy outside the frequency interval 

-W </< W. 

Then the number of different waveforms in the set {<^(0} is overbounded 

(, conservatively ) by 2ATW when TW is large. 

The definition of bandwidth in this theorem may seem somewhat 
arbitrary, but any meaningful evaluation of the bandwidth occupied by a 
time-limited, low-frequency waveform can be expressed as some constant 
times that bandwidth, W, just large enough to incorporate \\ of the wave- 
form’s energy. + Thus the theorem actually has unrestricted applicability. 
The important fact is that the number of orthogonal waveforms (dimen- 
sions) that can be accommodated by a “bandlimited” channel can grow 
no faster than linearly with time, T, regardless of how “bandwidth” is 
defined. 

The converse statement, that the number of dimensions (say N ) available 
with a bandlimited channel can grow linearly with T, is easy to demon- 
strate. We wish to show that 

N= DT, ' (5.16) 

where D, the number of dimensions available per second, varies linearly with 
W but is relatively insensitive to T. As a first example consider a pulse 
x(t) that is identically zero outside a time interval of duration r and 
occupies some (suitably defined) bandwidth W. Then Tjr such pulses can 
be placed without overlap into a time interval of duration T. Since non- 
overlapping pulses are orthogonal, this scheme provides a means of 
obtaining D = 1/r dimensions per second. 

Insight into the relationship between D and W is gained by considering 
the inverse scaling that exists between the time and frequency domains; 

f See Appendix 5A. Dollard 23 has obtained the tighter result that if each %(0 has no 
more than rj w 2 of its energy outside of (— W, fV), then the number of different wave- 
forms is overbounded by 27W(1 - V) for a11 va,ues of TW - 
Appendix 5B. 
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if the Fourier transform of x(t) is X(f), the Fourier transform of x(at) is 



z(cct)e- i2wfl dt = - 
a 


KS)e 


-j2„ (//«>£ 






(5.17) 


Thus, if a pulse x(t ) of duration r occupies a bandwidth W, the pulse 
x(at) has duration r/a and occupies a bandwidth afV. It follows that 
D ~ ajr of the pulses x(v.t) can be placed without overlap in a one second 
interval, which verifies the fact that D is proportional to bandwidth. 

As a second example of the converse statement, consider T-second 
pulses of sine and cosine waves separated in frequency by l/Tcps, such as 


5 0 (0 = 1 

Sj(0 = J2 sin 2 -tt - 
T 

s 2 (t) = J2. cos 2tt - 
T 

s 3 (t) — J2 sin 477 - 
T 

s 4 (t ) = Jl cos 4 tt- 
T 


T T 

2 2 


Each waveform is zero for |/| > T/2. Clearly, all such waveforms are 
mutually orthogonal. The corresponding signal spectra are related to the 
spectrum, S 0 (f), of s 0 (t) as indicated by Fig. 5.5. It can be verified through 
integration by parts and use of the tabulated sine-integral function 46 that 


rvT r<x, 

\So(f)\ 2 df > 0.9 |S„(/)| 2 tf/. (5.18) 

J-l/T J - 00 

It follows from Eq. 5.18 and Fig. 5.5 that, when TW is an integer, a total 
of 1 + 2[Wj{\{T)] =1-|- 2TW such signals can be accommodated in a 
bilateral frequency interval of bandwidth (W + 1 /T) with at least 90 per 
cent of the energy of every signal contained within this bandwidth. 

A difficulty in transmitting sequences of orthogonal pulses is that most 
physical channels introduce distortion; pulses that do not overlap when 
transmitted tend to be smeared together when they are received, as 
indicated in Fig. 5.6. The result, called intersymbol interference, is that 
strict orthogonality is lost and the value of D attainable in practice 
reduced, A brute-force remedy is to provide sufficient dead time between 
pulses that the interference is reduced to manageable proportions; elegant 
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Figure 5.5 Spectrum of T-sec cosine pulse at frequency k/T. The spectrum when 
* » 0 is S 0 (fi =» TismnfMirfT. 

approaches require careful waveshaping of the transmitted pulses and/or 
elaborate filters. In practice, the maximum number of essentially orthog- 
onal waveforms that can be transmitted in time T through a channel with 
nominal bandwidth W is between TW and f TW; the choice of definition 
for W and the cost of implementation are the determining factors. 


Transmitted 

signal 




Figure 5.6 Intersymbol interference. The solid curve in ( b ) is the composite received 
signal, obtained by summing the responses (dashed curves) due to each of the three 
transmitted pulses shown in (a). 


Bandwidth Requirements with Block-Orthogonal Signaling 

It is now easy to show that bandlimited transmission channels pre- 
clude the unrestricted use of block-orthogonal signaling. As we have seen, 
when a transmitter is connected to a source that provides input bits 
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at a rate R per second, the number of bits that must be transmitted in 
time T is RT and the number of different signals required is M = 2 RT . 
If we insist that these signals be orthogonal, the dimensionality theorem 
states that the number of orthogonal signals, M, and the bandwidth, W , 
satisfy 

M = 2 rt < 2.4 TW, (5.19a) 

or 


.nr 


W> 


2AT 


(5.19b) 


As T becomes large, W grows almost exponentially and therefore exceeds 
the bandwidth of any physical channel. 

The import of an exponential growth in bandwidth is made tangible 
by the following example. Consider a system operating at the modest 
rate of 100 bits/sec and assume that R and PJA'q in Eq. 5.14a are such 
that T — 1 sec is necessary to achieve the desired probability of error. 
Then 

moo 

W > - — ph 10 30 cps, 

2.4 


which is clearly outlandish. Viewed in the time domain, Eq. 5.19a states 
that if we wished to realize this system by using a block-orthogonal 
PPM scheme the number of nonoverlapping pulses per second would have 
to be 2 100 , which implies a pulse duration of 10~ 21 nanosecond! 


5.4 EFFICIENT SIGNAL SELECTION 

In Section 5.2 we observed that block-orthogonal signaling over an 
additive white Gaussian noise channel would yield a probability of error 
that approaches zero exponentially with increasing block duration T for 
rates R less than 0.72 PJN 0 . The drawback was that the bandwidth 
requirement becomes exponentially large (substantially infinite) for large 
T. We now show that it is possible to achieve a probability-of-error 
behavior analogous to that of orthogonal signaling while simultaneously 
meeting the bandlimited channel constraint that the dimensionality of the 
signal space grow only linearly with T. 

A direct demonstration of this fact is not possible for two reasons. 
First, unless some regular structure is imposed (as in the two examples in 
Section 5.2), the mere task of specifying a set of M = 2 RT different 
signals is enormous when T is large. Second, even if the problem of signal 
specification were manageable, in general we would be unable to analyze 
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the P[8j that results from use of the specified signal set. Strangely 
enough, it is much easier to demonstrate that as T becomes large a great 
many signal sets with linearly increasing dimensionality yield an exponen- 
tially decreasing probability of error (for rates that are not too high) than 
it is to exhibit a single specific set of signals behaving in this way. 

Signaling with Sequences of Binary Waveforms 

As a first example, let us consider a case in which the available number 
of dimensions per second, D, exceeds the rate R: 


D> R. 


(5.20a) 


For simplicity, we again (as in Section 5.2) restrict the signals to lie on the 
vertices of a hypercube. Since the number of vertices on a hypercube of 
DT dimensions is 2 DT and the number of signals required is M = 2 RT , 
not all of the vertices need be used. In fact, the fraction of vertices that 
we must use, 

'lRT 

= 2~ {D ~ R)T , (5.20b) 

approaches zero as T increases. Thus there is a possibility that we can 
avoid the convergence of the probability of error to unity with increasing 
T which we observed in Section 5.2 as a consequence of the nearest- 
neighbor structure when D = R. 

Restricting the signals {$,(*)} to the vertices of a hypercube implies that 
each signal has the form 


Si(t) = for i = 0, 1, . . . , M - 1, (5.21a) 

j=i 

where 

Sn = E n ; all i and j, (5.21b) 

A 

N = DT, the number of dimensions in time T, (5.21c) 

and E n is defined as the available signal energy per dimension. As in 
Chapter 4, can he any set of orthonormal waveforms : 


<Pi( 0 <Pi(t) dt = d n ; all / and j. 


For example, the {<p 3 (0} might be successively delayed, nonoverlapping 
replicas of some finite-duration, unit-energy pulse, as shown in Fig. 5.7. 
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The constraint on the average transmitted power, P s , requires 

Es=PsT = fs i * = NE H , (5.22a) 

1=1 
or 

E P 

E N — ~ = ~ joules/dimension. (5.22b) 

For the {<pj(0} of Fig. 5.7 the signals (s/t)} are sequences of positive and 
negative nonoverlapping pulses, each pulse containing energy E N . 


<p(t) 

r\ 

i \ 

1 V 

Ih-W 

<Pl(t) (p2(t) <fi 3 (t) ifldt) 

A A A A 

<ps(0 

A 

A, . 

— T 

T 2 T 3 T 

4 r 

5 r 6 r 


j ipf(t) dt= l 
<Pj(t) = <p(t -jT ) 


Figure 5.7 Orthonormal (pulse position) waveforms. 

The average probability of error. The problem of signal selection for 
this particular example reduces to the assignment of the vectors of 
coefficients {j w } in Eq. 5.21a: 

s i — s i& • • • > s £iv)l i = 0, 1, . . . , M — 1, (5.23) 

As we have mentioned, a good specific assignment is hard to find and hard 
to analyze. These complications can be circumvented by bounding the 
attainable probability of error by an ingenious indirect argument due to 
Shannon. 75 The key to the derivation is to consider not just one com- 
munication system , but rather a whole collection of communication systems, 
each consisting of a transmitter, channel, and optimum receiver. As shown 
in Fig. 5.8, the systems are identical, except that each employs a different 
set of signals (sj. 

There are 2 N = 2 DI different vertices available on our A-dimensional 
signal space hypercube and M = 2 RT signals (sj to be assigned thereon; 
it follows that there are (2 jV ) m = 2 NM distinct ways to assign the M 
signals. We assume that each of these 2 XM signal sets is used by one (and 
only one) of the communication systems in our collection, and that each 
system uses a receiver that is optimum for its signal set. Following 
common usage, we refer to the signal sets as codes, and to the signal 
vectors as codewords. 
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It is clear that each system in our collection has a definite probability 
of error, say P t for the /th system, / = 1,2,..., 2 NM . Some of the 
systems — for example, those with codes in which all M of the vectors (sj 
are assigned to the same vertex — have a very large probability of error. 
On the other hand, most of the systems have a probability of error that is 



Figure 5.8 Collection of communication systems, each using a different set of M 
signals {s,} { , / = 1,2,..,, 2 N11 . 


quite small, a fact that we shall prove by calculating a bound on the 
arithmetic average, denoted P[8], over the entire collection: 

-i 2 Ni ' 1 

m = (5.24) 

Clearly, not all of the P t can be greater than P[S]. 

It may be surprising that one can bound the average probability of error 
for a collection of communication systems when one cannot calculate the 
probability of error of an individual system. Such was Shannon’s insight. 

To calculate a bound on P[S], we first interpret Eq. 5.24 as a statistical 
rather than an arithmetic average. Although this interpretation is not 
essential, it simplifies the derivation by permitting us to use the notation 
and results of the preceding three chapters. Consider a probability system 
in which each point a> of the sample space has associated with it one of the 
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systems of Fig. 5.8 as well as a message, a noise waveform, and the 
resulting received waveform. The probability assigned to the system 
utilizing code {sj is 

P[{ s .}] = 2~ nm , (5.25a) 

and is statistically independent of the message and the noise process. 
If the code for the /th system is (sjj, we have 

P[ S | (s ,} J = P,. (5.25b) 

By using Eqs. 5.25, Eq. 5.24 may be rewritten 

Pm = E[P,]= I P[S|{ S J]P[{ Si }]. (5.26) 

all codes 

We now bound P[S]. When message m k is transmitted, the conditional 
probability of error, P[S j m k ], averaged over the collection of codes is 

P[S I rn k ] = 2 P[{sJ] P[8 | m* {■,}]. (5.27) 

all codes 

in which P[£ | m k , {s t }] is the conditional probability of error, given 
m = m k , for a specific code (sj. Application of the union bound of 
Eq. 4.109 to each specific code yields 

M-l 

P[8 I »*{*}]< I ?.[*.*], (5-28) 

i =0 

where P 2 (s £ , s fc ] is the probability of error when the two signal vectors s i 
and s k are used to communicate one of two equally likely messages. 

If it were easy to evaluate the right-hand side of Eq. 5.28, there would 
be no need to consider the collection (ensemble) of possible communication 
systems. This evaluation, however, requires both explicit knowledge of 
the signal set {s,-} and unlimited patience. The crucial advantage to be 
gained by considering the ensemble of systems is that both difficulties are 
avoided by an interchange in order of summations. Substituting Eq. 5.28 
in Eq. 5.27, we have 

M—l 

P[8|mJ< 2 PIWlZP.fes.1' 

all codes o 

( i & fc ) 

Interchanging the order of summations yields 

P[S I mj <‘2 1 ( 2 PIMIPsk.s*]) 

2=0 tail codes ; 

= 2 P 2 [s t , S fc ], 

2=0 
U 


(5-29) 
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in which the bar denotes expectation over the ensemble of communication 
systems. Thus interchanging the order of summations makes averaging 
over the code ensemble the next step in the bounding of P[8]. 

For the additive white Gaussian noise channel, P 2 [s f , s fc ] depends only 
on the Euclidean distance between s*- and s k . In accordance with Eq. 4.1 10, 




If s f and s k differ in h coordinates, the square of the distance between 
them is 

Is, - s s | 2 = i( S „ - s ti r = h (2VI^) 2 = 41lE N . (5.31) 

i= 1 

Over the ensemble of codes, the probability assignment of Eq. 5.25a 
implies that s* is equally likely to be any of the 2 N vertices of the signal- 
space hypercube, independently of s fc . Thus the probability that s tj equals 
s kj is i, independently for all j = 1, 2, . . . , N. As a consequence, the 
probability that s,- and s fc will differ in h coordinates is just the probability 
of getting h Heads in N tosses of an unbiased coin : 




(5.32a) 


The expected value of P 2 [s £ , s A .j over the ensemble of codes is therefore 

(532b) 

Since the right-hand side of Eq. 5.32b is independent of the indices / 
and k , it is convenient to introduce the simpler notation 

PJS] 4 P 2 [s„s fc }. (5.33) 

With this notation we observe that 

M - 1 — 

P[s I m J < I P 2 [s„ sj = (M - 1)P 2 [£] < M P.[6] 


P[£] = ^ | m *J P W 

fc=o ^ ^ 

< MPM*2P[m k ] = MPp]. 

fc=0 


Bounding P[8] now reduces to bounding P 2 [Sj. 
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Recalling from Eq. 2.122 that 

G(«) < e~«'\ 

we substitute in Eq. 5.32b and obtain 

POT <|2- v (^)e-“ N,jV " = r ‘ Y | 0 (ft) &“***• 

But 

[1 + a]W -l (?)“*■ 

which implies 

¥M < 2”* v [l + e - K " ,x °]X. (5.35) 

This may be written more concisely as 

PM < 2~ nr °, (5.36a) 

in which we introduce the exponential bound parameter R 0 , identified from 
Eq. 5.35 as 

Ro ~ l0g2 (i + A'n^o) 

= 1 — log 2 (l + e _E N/A >o ). antipodal signaling. (5.36b) 

Finally, the combination of Eqs. 5.34 and 5.36 yields the end result of 
our analysis, the bound 

W]<mpM 

< M 2~ nr °. (5.37) 

Defining R N as the transmitter rate in bits per dimension, 


so that 


a R bits/second 

D dimensions/second 

M = 2 tr = 2 NRn , 


(5.38a) 

(5.38b) 


we can rewrite the bound in the convenient form 


P[8] < (5.38c) 

Equations 5.38 state that as long as R N is less than the exponential bound 
parameter R 0 , the average probability of error — hence the probability of 
error for at least one code in the collection — can be made arbitrarily small 
by taking N sufficiently large. The number of dimensions N is frequently 
called the code block length. 
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The parameter R 0 is plotted in Fig. 5.9 as a function of the signal-energy- 
to-noise ratio per dimension, Eul^o- Since the maximum value of R 0 is 
unity (corresponding to E N -*■ co in Eq. 5.36), the exponent [R 0 — i? N ] 
in Eq. 5.38 can never be positive for a rate i? N greater than or equal to 
1 bit/dimension. This is consistent with the result given in Section 5.2 for 
bit-by-bit signaling: when i? N = 1, we have R = D, which implies that 
the required number of signals equals the number of available hypercube 
vertices. Using antipodal signaling (binary codes) restricts the system 



Figure 5.9 R 0 for binary antipodal signaling. The units of R 0 are the same— bits / 
dimension — as those of R n- 


to operation at rates R less than D bits/sec if the probability of error is 
required to be arbitrarily small. 

Selecting a specific code. Although the class of all possible codes 
(signaling sets) constructed in accordance with Eqs. 5.21 has been shown to 
yield an average probability of error that decays exponentially with in- 
creasing N when the bit rate per dimension is not too great, we have not 
yet considered the problem of selecting a single, specific code. It is evident 
that this problem is not a sensitive one insofar as error probability is con- 
cerned. The quantity P[S] is the average value of the positive quantities 
{Pj}, where P t is the probability of error for the /th code in the class. Since 
only a fraction 1 /A of a set of positive numbers can be larger than X times 
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their average, at least 90 per cent of all codes in the collection must have a 
P[S] no larger than 10 P[8], and 99 per cent of all codes must have a P[S] 
no larger than 100 P[8]. 

For rates such that P[S] decays exponentially with N, it is possible when 
designing a system to choose N large enough so that 10, 100, or even 
1000 P[8] will be as small as we like. For example, if N — is sufficient 
to guarantee P[8] < 10~ 8 , clearly N = %N X is sufficient to guarantee 
100 P[8] < 10~ 8 . Measured in terms of the required fractional increase in 
code length N, only a small price need be paid to gain reasonable assurance 
that a code picked at random is good. Of course, once a good code has 
been chosen, it can be used for many transmissions and in many systems. 

Discussion. An intuitive understanding of why a P[S] that decays ex- 
ponentially with N results for P N < R 0 can be gained from the following 
considerations. We recall that R 0 specifies the exponential bound on 
P 2 [8], the mean probability of error (over the ensemble of codes) when one 
of two signals is equally likely to be transmitted over a channel disturbed 
by additive white Gaussian noise: 

p^i = < 2-™». 

' vTn’o / 

On the average, two signals chosen independently at random from 2 iV 
hypercube vertices differ from one another in approximately Nj 2 co- 
ordinates. Thus the root mean square distance between two such signals 
increases linearly with V N. Since Gaussian noise produces a probability 
of error that decays exponentially with the square of the Euclidean 
distance between two signals, it is reasonable that P 2 [8] should decay 
exponentially with N. 

Two phenomena enter into the occurrence of an error when M = 2 and 
the signals are chosen at random. The first is that the noise may be 
unusually large and cause an error even though the Euclidean distance 
between the two signals is typical, as shown in Fig. 5.10 a. The second is 
that the noise may be typical but the two signals may be poor in the sense 
that the distance between them is unusually small (see Fig. 5.106). The 
value of R 0 in Eq. 5.36 represents the combined influences of these two 
phenomena. When £ - N /JY > 0 is large, R 0 approaches unity, and the P 2 [8] 
approaches 2~ y . But 2~ y is just the probability of assigning the two 
signals to the same hypercube vertex ; we recognize that it is the second 
phenomenon that dominates R 0 when EJN 0 is large. On the other hand, 
when E n /N 0 is small, errors are likely to occur even when the two signals 
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<a) (b) 

Figure 5.10 Two possible signal pairs: (a) typical spacing; (6) small spacing. 


are typical in the sense that they differ in approximately Nj2 components. 
Under these circumstances R 0 is dominated by the first phenomenon. 

This heuristic discussion is extended to the case of M randomly selected 
signals by recognizing that there are three distinct and statistically in- 
dependent selections entering into the occurence of error: 

1. The data source selects the transmitter input m. 

2. Nature selects the relevant noise n. 

3. The communication system engineer selects the signals (sj. 

P[S] denotes the probability of the event error in the product ensemble 
describing the three selections. It is convenient to visualize these selections 
as talcing place in the order listed and to assume m is m k . We may also 
visualize that the system engineer first selects the transmitted signal s k and 
then the remaining M — 1 signals. An error occurs if and only if one or 
more of the M — 1 remaining signals (which over the ensemble are 
selected without reference to s fc , n, or each other) lie closer to r = s fc + n 



Figure 5.11 An error occurs if any signal s,- falls into the shaded region, since then 
|r - »,| < |r - s*| = |n|. 




SIGNALING WITH SEQUENCES OF BINARY WAVEFORMS 307 


than the distance |nj from s fc to r ; ,|as indicated in Fig. 5.11. For each of 
the remaining signals the probability of falling into this forbidden region 
is, by definition, P 2 [£]. Since there are (M — 1) chances for some signal 
to fall into the forbidden region, we immediately have the union bound 

P[S] < (M — 1) P JSj. 

The average probability of error approaches zero with increasing N as 
long as the number of messages M = 2 NR N grows with N less rapidly than 
P 2 [£] decays. 

Comparison with block-orthogonal signaling . It is interesting to com- 
pare the bound of Eq. 5.38 with the behavior exhibited in Eqs. 5.15 for 
block-orthogonal signaling, namely. 


P[g] <• 2"^t(®6/A , o)(l/2 In 2)-l] ^ 


(5.39a) 


where K as usual denotes the number of transmitter input bits during an 
interval T. Thus the energy per bit utilized with orthogonal signaling must 
satisfy the bound 




for orthogonal signals 


(5.39b) 


in order that the bound on probability of error tend to zero with increasing 
block size, K. 

The corresponding limitation on EJX 0 with binary coding is obtained 
by rewriting Eq. 5.38c in the same form as Eq. 5.39a. Since 


K = RT= NRk 


we have 


pjgj <-■ 2 — *Y[i?o— ■ ^ nI — 2 -Jf t (R o/ R N > ~ 1 I 

Moreover, 

£ n = energy per dimension = / ener Sy \ l — \ 

\ bit / Vdimension/ 

so that (from Eq. 5.36) 


(5.40a) 


= E b R N , (5.40b) 


=E Rp _E b R 0 _ E b 1 - log 2 (l + e- g N/^o) 

E N J'Tq EhINq J’Tq Efg/J'Tg 


_A EJJ ? Q 

“ 1 - log 2 (1 + e"^") ‘ 


(5.40c) 


(5.41a) 


(5.41b) 


308 EFFICIENT SIGNALING FOR MESSAGE SEQUENCES 



Figure 5.12 Lower bound to the allowable ratio ■EJJ'P 0 for binary-coded systems. 
For the bound on P[S] to go to zero with increasing K, we require 


K 

— - > a; for binary-coded signals. (5.41c) 

J'P 0 

The parameter a is plotted in Fig. 5.12 as a function of E n /N 0 . Its 
minimum value — attained as E n /JNP 0 — > 0 — is 2 In 2, and a exceeds this 
minimum only slightly for E N jJ\ p 0 < — ■ 10 db. Thus the exponential 
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decay of the average probability of error bound over all codes of the class 
considered here is substantially equivalent to that obtained with block- 
orthogonal signaling , provided that D can be made large enough so that 
E n IM> 0 is small. This corroborates our earlier observation that under 
this condition it is the noise that dominates R 0 . 

In Chapter 4, we claimed that simplex signals are optimum for com- 
munication over an additive white Gaussian noise channel and that orthog- 
onal signals are substantially equivalent to simplex signals when the 
number of signals, M, is large. Since the exponential decay of P[S] 
becomes substantially equivalent to that obtained with block- orthogonal 
signals, we conclude that the class of hypercube-vertex (binary-coded) 
signals may be considered to be “exponentially optimum” provided that 
the noise, the available number of dimensions per second D, and the 
received signal power P s are so related that 


or 


energy/sec _ _P S < 
dimensions/sec D 10 


(5.42a) 


D>i0 ¥: 


(5.42b) 


Signaling with Multilevel Sequences 

We have just inferred that the signal class consisting of binary-wave- 
form sequences is exponentially optimum whenever the ratio PJN 0 is 
much smaller than the number of dimensions available per second, D. 
We have also observed in Fig. 5.9 that for this signal class P 0 saturates at 
one bit per dimension when E n /N 0 » 1 . Since we certainly expect that 
large enough EJJP 0 should permit reliable communication at rates above 
one bit per dimension, we anticipate that the class of binary-waveform 
sequences will not be exponentially optimum when E N jJf 0 is large. 

As noted in connection with Eq. 5.20, the saturation of R 0 in Fig. 5.9 is 
attributable to the fact that the total number of distinct binary-waveform 
sequences occupying DT dimensions is 2 DT , so that R cannot exceed D 
bits/sec. The only way to avoid this saturation effect is to augment the 
class of allowable signals. Since in many situations PJN 0 is large but 
the bandwidth is limited — for example, in digital communication over toll- 
grade telephone linesf — it is important to consider signal sets (sj that are 
not constrained to lie on the vertices of a hypercube. 

f Although the noise on telephone circuits is not simply Gaussian, experiments 54 have 
demonstrated that a sizable improvement in rate can be achieved by the use of nonbinary 
waveforms of the kind to be discussed here. 
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Energy ratio per dimension, 10 iog 10 £ N /^o 
Figure 5.13 Comparison of R 0 * and R„ for binary signaling. 


Shannon, 74 in a derivation beyond the scope of this book, considers 
additive white Gaussian noise and TV-dimensional signal sets {s f } that are 
constrained only in energy:! 

|s/ < TV£ N = TV ; i = 0, 1, . . . , M - 1. (5.43) 

t See also Gallager. 32 
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He then shows that sets of M = 2 a ' a n signals exist for which the error 
probability is bounded by 

Pfg] < 0 < R n < R* 3 (5.44a) 

with 

R* = 4. 

2 L JC 0 

+ b°&^( 1 +7 1 +g))]. (5.44b) 

In addition, he proves that no set of M = 2 NR n signals satisfying Eq. 5.43 
exists such that the bound of Eq. 5.44a is valid for arbitrary TV and when 
R 0 * is replaced by a larger number. (We shall see in Section 5.6, however, 
that more elaborate bounding techniques do yield tighter results for 
particular values of i? N .) 

In Fig. 5.13 R 0 * is plotted as a function of EJJf 0l together with the 
R 0 achieved by the ensemble of binary-waveform sequences. Our intuitive 
feeling that rates greater than D bits/sec must be attainable for large values 
of PJJf 0 is justified. For EJX , 0 < 0 db, R 0 nearly coincides with T? 0 *, 
but for EJ J\P 0 > 0 db the binary- waveform sequences are less desirable. 
For E n /N 0 > 10 db they are exceedingly inefficient. 

We now consider certain signal classes that yield a bound parameter R 0 
that is substantially as large as R 0 *, even for large values of EJX 0 . The 
adverse effect of saturation is circumvented by not restricting the signal 
vectors (sj to the vertices of a hypercube. An especially convenient 
augmented class of allowable signals, in terms of analysis and imple- 
mentation,! is on e in which the components of the signal vectors are 
still restricted to a finite number of different values, but in which this 
number, say A, is now an integer greater than 2. The total number of 
allowable signals of the form 

= ®<2> • • • » Sj.y), (5.45a) 

N 

*i(0 = 2%9> 3 (0 (5.45b) 

j=l 

is therefore 

A x = 2-Viog 2 ^. (5.45c) 

For this class of signal, saturation does not occur until 

M = 2 X *» = A n 
or 

R h = log 2 A. 

t Questions of implementation are considered in Chapter 6. 



(5.45d) 

(5.45e) 
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For example, if ^4=4, the saturation value of i? N is 2 bits/dimension 
rather than 1. Thus we may hope to obtain a bound of the form 

P[£] < 

in which R 0 is greater than 1. 

To complete the specification of the enlarged signal class, we must 
state the A values permitted to the {%}. We consider only the case in 
which each s can be assigned any one of A amplitudes equally spaced 
over the interval [— \/-£ N > V-EnL as shown in Fig. 5.14 for A = 8. Such 



ai <22 03 04 05 £Z 6 O 7 «8 



Figure 5.14 Possible set of values permitted the {%}; A — 8. 


an assignment guarantees that js,| 2 < NE n for all i. For example, the 16 
allowable signals when A = 4 and N = 2 are illustrated in Fig. 5.15. The 
set of values permitted the {s i3 } is called the signal alphabet and denoted 
{, a / = 1, 2 A. The members of the alphabet are called letters, 
and the set of all A N allowable signal vectors is called the code base. 

To determine R 0 as a function of A and E N jX 0 , we again bound the 
mean probability of error, P[£), over an appropriate ensemble of com- 
munication systems. Since each message may be assigned any one of the 
A N vectors in the code base, the total number of distinct codes — assign- 
ments of M messages to code-base vectors— is (A N ) M = A NM . As when 
A — 2, codes in which several messages are assigned to the same vector 
are included in the count. For bounding P[8], we consider an ensemble 
containing A NM communication systems, each of which uses a different 
code (sj together with a receiver that is optimum for that code. 

We recall that P[£] is the ensemble average of the probability of error 
of each system in the ensemble. In evaluating P[S] for A = 2, we assigned 
each of the 2 NM systems equal probability, which implied 



I P[S|{*}]. 

all codes 



When A > 2, the ensemble average probability of error, P[S], is reduced, 
hence the value of R a increased, by assigning nonequal probabilities to 
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the A NM systems in. the ensemble. The reason is that the signal alphabet 
{rrjis asymmetric, as seen in Fig. 5.14: although each letter is equally 
distant from its nearest neighbor, the letters +-Je^ and ~s/e h have 
neighbors only on one side. These end letters are, in a sense, more 
distinguishable, and we anticipate that the probability of error will be 
smaller for systems with codes (sj in which letters near the ends are used 
more frequently than the interior letters. 


<P2 



Figure 5.15 The code base when A — 4, N = 2. 


In order not to. preclude a preference for the better codes in the analysis 
of P{6], the assignment of a probability to each of the A NM systems in 
the ensemble is accomplished as follows. We first associate with every 
alphabet letter a h l = 1,2 A, a non- negative number p z such that 

Pi + p 2 -I- • • * + Pa = (5.46a) 

Next, for each system we observe its entire code (sj and count the total 
number of times, say N t , l = 1, 2, . . . , A, that letter a t appears therein. 
To this system we assign the probability 

Pfe}] = Px Sl Pl S ' ■ ■ ' Pa N jI - (5.46b) 


Since each code comprises M codewords with N symbols apiece, it is 
clear that for every system N x + N 2 + • ■ • + N A = NM. For example, 
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a code {s j containing M = 5 members is shown in Fig. 5.16, with N = 2 
and A — 4. For this code 

= 4, N 2 = 2, N 3 = 2, N 4 = 2. 

If we choose p l = p, x = A and p z = p 3 = fo, then 

P[{sJ] = Px'pfaW = 4 - 096 x 10 ^ 7 - 

Another way of expressing this probability assignment is to state that, 
over the ensemble, the probability that component s i} will be the /th 
letter of the alphabet is just p t , independent of all other components in 
s t and in the remaining code words {s*}, k ^ i. With this alternative 
definition, the probability assigned to any code {sj is exactly that given in 
Eq. 5.46b. This interpretation assures us that the probability assignment 
of Eqs. 5.46 is valid. 

<P 2 


<pi 


Figure 5.16 Code with M = 5, A = 4, N = 2. 

By choosing the p / s associated with letters near the ends, + and 
to be larger than the p/s associated with the interior letters we 
can permit a system whose code contains a larger proportion of end letters 
to contribute more strongly to the average probability of error than a 
system whose code contains a smaller proportion. If we choose to let 
each p t equal 1 1 A, every possible code is equally likely, and all systems 
will contribute equally to P[8], We first calculate R 0 for arbitrary { p 
then we specialize the {/?J to values that maximize R 0 . f 
The procedure for obtaining an exponential bound on P[£] for the 
ensemble specified by Eqs. 5.46 is parallel to that followed in the case of 

f If, in selecting a specific code, letters are chosen for codewords independently but 
with the optimum {/>,}, the probability is high that a code with performance comparable 
to that afforded by the maximum R 0 will be obtained. 
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binary antipodal letters. We again have 


P[6 ! Ml,] = I P[G I m K Ml P[{s,}], 

all codes 


P[8 | m k , {*}] < ^ p 2 [s*> sj. 


P[S|m fc ]< I PfelllP^s,] 

all codes i=o 

(i*k) 

M-\ 

- 2 (5-47) 

i = 0 
(i&k) 

Provided that we can obtain a bound of the form 

Pji^j = ¥M < 2~ NRi> (5.48) 

for any k and all i ^ k, it will follow as before that 

M-l 

P[6 | m k ] < I P 8 [S] < M2~ nr °, 

i= 0 
i*k 

and therefore 

P[8] < 2~ N[Ro ~ R ^\ (5.49) 

which is the result that we desire. 

It remains to prove the validity of Eq. 5.48 and to evaluate i? 0 . Since 
the noise is additive white Gaussian, we again have 

p 2 [s.,sj = e(^==~) 


[ <v 

< exp - — 2(% - s ltj f 
L 4Jf o i - 1 J 

a' r i 

= TI ex P -7rr( s «- 5 w) a ’ 

}•=> i L 4JV 0 J 


where {s^} and {s kj } are the components of s f and s ft , respectively. Averaging 
over the ensemble of codes yields 

p 2 [s i, Sjfc] < E XX exp (- (% - s*,-) 2 ] . (5.50) 

_j=i 1 4Jv o jj 

Evaluation of the right-hand side of Eq. 5.50 is facilitated if we denote 
the distance between the /th letter and the hth letter of the transmitter 
alphabet {a,} by the symbol d lh : 

d lh = \a t - a h \ ; l, h = 1, 2, . . . , A. (5.51a) 
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For example, with the letters of Fig. 5.14 

d a = 1 1 - htj. (5.51b) 

A — 1 

If s u = a t and s kj = a h , we then have 

Oii - su) 2 = d in ■ (5.51c) 

We next recall that over the ensemble of codes the probability of the 
joint event (s H = a lt s kj - a h ) is p lPh , independently of the coordinate y 
we are considering and independently of all other letter assignments. Thus 

P l(s it - %) 2 = d u ?} = PlPh , (5.52) 

independently for all j, i, and k. 

The desirability of assigning probabilities to the different members of 
our ensemble of communication systems in accordance with Eqs. 5.46 
now becomes evident. The statistical independence of the {(*« - %) 2 } 
permits us to simplify Eq. 5.50 by using the fact that the expected value 
of a product of statistically independent random variables is the product 
of their expected values. Since the random .variables {(*« - %) 2 }, hence 
the random variables 

exp — — — " (% — 5 ,) 2 1 > j — 1> 2, . . . , N, 

L 4Jf 0 J 

are statistically independent, Eq. 5.50 yields 

pKT 3 < n E[exp {- («,, - s«) 2 }_ ■ 

The rest is definition. From Eq. 5.52 

e[«p I- - oil = i i p K s « - V s = 

L [ 4jV 0 JJ 

A A o 

= 1 2 •• 

{=1 fc=l 

Defining 

b lh = e - a “ S ' lA "" (5,53a) 

and 

R 0 4 -log 2 2,Pib lh pX (5.53b) 

we have 

pra<n2- B » = 2-^. 


(5.54) 
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Since the bound of Eq. 5.54 is valid for all i and k, we can again denote 
P s fc] hy PJS]. Thus Eq. 5.48, and therefore the bound 

P[8] < 2-^-^, (5.55) 

is verified, with R 0 given by Eqs. 5.53. The average probability of error 
for an ensemble of communication systems using codes with letters 
selected from an A -letter alphabet decreases exponentially to zero with 
increasing codeword length, N, as long as R N < R 0 . Only the value of 
R 0 has changed from when A = 2; the form of the bound is the same. 

The bound of Eq. 5.55 has been derived for any set of A amplitudes 
(a J and any set of letter probabilities (pj. In the special case when we 
choose all pl = l /A, the expression for R 0 reduces to 

Ro = -log* -Is 1 2 <r d ‘» 2/4jv \ (5.56) 

A i=ih=i 

For A = 2 and a x = +V^ N , a 2 = — we have d n == d 22 = 0, 
di 2 = d 2 i = 2 and 

R 0 = -log 2 i[e~ EN/,N>0 + e° + e° + 

1 • 2 

g2 1 + e -SN/^o ’ 

which agrees with the previous result of Eq. 5.36. More generally, given 
any choice of amplitudes {</*}, Eq. 5.56 can be evaluated on a digital 
computer. This has been done for the alphabets chosen as in Fig. 5.14 
with A = 2, 3, 4, 8, 16, 32, and 64. Curves of R 0 as a function of E n /N 0 
are shown in Fig. 5.17. We see that the upper envelope of the curves has 
small dips at the crossover points. 

In Appendix 5C we consider the problem of choosing an optimum 
probability assignment for the {p J, given any particular signaling alphabet 
(a J. The curves of i? 0 that result when the optimum (pj are used with the 
equally spaced letters of Fig. 5.14 are shown in Fig. 5.18, together with 
R 0 *. We see that the dips disappear and the upper envelope is smooth. 

The upper envelope of the nonoptimized curves of Fig. 5.17 is also 
included in Fig. 5.18, as dotted lines. The advantage of using optimum 
{/?*}, compared with equally likely (pj, is small as long as the value A is 
properly chosen. 

It is clear from Fig. 5.18 that a relatively simple ensemble of codes with 
A equally spaced letters having equal probabilities can always be chosen 
so that R 0 is close to R 0 *; for no value of E N /Jf 0 does the i? 0 * curve 
exceed the nonoptimum envelope by more than 35 per cent. 

The reason for the discrepancy between R 0 * and the R 0 for multi- 
amplitude-waveform sequences is easily discerned. First, the condition 
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Energy ratio per dimension, 10 log 10 E N /Jfo 


Figure 5.17 R 0 for equispaced /1-level amplitude modulation and p t — l! A; 
l = 1,2,..., A. 

imposed by Shannon in the derivation of R 0 * was that (cf. Eg . 5,43 ) the 
total length of each signal vector s< must be no greater than -J NE N . On 
the other hand, in the derivation of R 0 we required that no vector component 
s i} could be greater than Ve n , which is sufficient but not necessary to 
satisfy Eq. 5.43. The first restriction is a constraint on the total energy 
of each signal, whereas the second is more akin to a peak-power con- 
straint. If we consider the case in which the {a t } are equally spaced over 
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For large A there is a factor of 3, or 4.8 db, between the allowed energy 
in the derivation of R 0 * and the mean energy in the derivation of R 0 . 
When A = 3, E[|s f | 2 ] = | \NE U and the discrepancy is 2 db. Energy 
differences account for all but something less than 1 db of the discrepancy 
between R 0 * and R 0 -t 

The remaining 1 db is due to the signal structure assumed for our code 
ensemble. We restricted all signal vectors to fall within a hypercube 
centered on the origin, each side of which has length 2\! E N , as illustrated 
for N = 2, A = 4 in Fig. 5.15. The set of signals considered by Shannon 
is constrained only to fall within a hypersphere of radius -J NE n , as 
indicated by the dashed circle in Fig. 5.15. The additional volume for 
locating signals, internal to the sphere but external to the cube, is signifi- 
cant for large N. (The term “hypersphere” means an JV-dimensional 
sphere and is defined mathematically in Section 5.5.) 

Good engineering must reflect the complexity of system implementation 
as well as system performance; as discussed in Chapter 6, there is often 
considerable merit in working with a slightly nonoptimum class of signals 
to facilitate implementation. For the additive white Gaussian noise 
channel, the multiamplitude waveform sequences are such a class. 

It is interesting to note that, for each value of A, the corresponding R 0 
approximates R 0 * for some range of E N / J\P 0 . In every case this range 
lies several decibels below the value of E’n/^’o at which R 0 approaches 
the alphabet saturation level, log 2 A. The explanation, as- in the case 
A = 2, is that R 0 is “optimized” by choosing A large enough that the 
effects of noise, rather than the probability of choosing a bad code because 
of a shortage of signal points, dominates the value of R 0 . 

5.5 CHANNEL CAPACITY 

The relatively simple argument used in Section 5.4 — that the probability 
of a union is bounded by the sum of the probabilities of its constituents — 
is sufficient to obtain the bound 

P[Si<MPp], (5.58a) 

and thence the result 

P[8] < (5.58b) 

Equation 5.58b guarantees that signal sets exist which afford communi- 
cation through white Gaussian noise at any rate R N < R 0 with arbitrarily 

f This same consideration accounts for the discrepancy in Fig. 5.17 between the curves 
of R 0 for large A and for A = 2 when the {/>*} are equal to II A and EhI^o is small. 
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low error probability. The value of N obtained by setting the right-hand 
side equal to the desired error probability is an upper bound on the 
number of dimensions necessary to achieve such performance. 

On the other hand, Eqs. 5.58 do not imply that an arbitrarily low prob- 
ability of error cannot be obtained for rates i? N greater than R 0 . Indeed, 
we have already seen that R 0 * > R 0 . The central question concerning the 
ultimate limitations imposed by noise remains. 

Capacity Theorem 

A complete answer to this question is provided by specialization of a 
theorem, due to Shannon, 72,75 called the capacity theorem. Roughly 
speaking, this remarkable theorem states that there is a maximum, called 
channel capacity, to the rate at which any communication system can 
operate satisfactorily when constrained in power; operation at a rate 
greater than capacity condemns the system to a high probability of error, 
regardless of the choice of signal set or receiver. The theorem is extremely 
general and is not restricted to Gaussian channels. For such channels, 
however, it is clear that the capacity is at least as great as R 0 , since we 
have already proved the existence of systems that yield arbitrarily small 
error probabilities for any rate less than R 0 . 

Recalling that the number, D, of dimensions that can be accommodated 
per second by a bandlimited channel is not sharply specified, we state the 
capacity theorem in terms of the parameters E N and R N , where R N = R/D 
again denotes the transmitter input rate in bits per dimension. The energy 
of each signal is constrained to be no greater than NE n , where N is the 
dimensionality of the signal space. 

In the particular case of transmission over an additive white Gaussian 
noise channel the capacity theorem may be stated as follows: 

Theorem. There exists a constant, C N , given by 

C N = Uog 2 (l+2^-) (5.59) 

and called the Gaussian channel capacity, with the following properties: 

Negative Statement. If R M > C N and the number of equally likely 
messages, M — 2 NHn , is large, the probability of error is close to 1 for 
every possible set of M transmitter signals. 

Positive Statement. If R N < C N and M is sufficiently large, there 
exist sets of M transmitter signals such that the probability of error 
achieved with optimum receivers is arbitrarily small. 
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Figure 5.19 Comparison of capacity and R 0 *. 

A plot of C N as a function of is given in Fig. 5.19, together 

with R 0 *. We see that 

IC N < R 0 * < C N . (5.60) 

Thus the apparent limitation J? N < R 0 on rate evidenced by Eq. 5.58 is 
attributable to the manner of bounding and is not an inescapable attribute 
of is N /Jf 0 . The bound i? N < C N is inescapable. 


The bound i? N < C N (bits/dimension) implies that 

R < DC n (bits/sec). (5.61a) 

The number of dimensions per second that can be accommodated by a 
bandlimited channel is boundedf (cf. Appendix 5 A, Eq. 5 A. 6) by 

D^2W. (5.61b) 

If we achieve the upper bound D = 2W, then F N = PJD = P S 12W, and 
we have the well-known result 

R < 2WC n - W log 2 (l + 4 C, (5.61c) 

where C is the capacity in bits per second and P s , defined by Eq. 5.5, is the 
maximum average power allotted to any transmitted signal. 


Proof of the Capacity Theorem 


This proof of Shannon’s 72 capacity theorem for the Gaussian channel 
is essentially geometric. The proof is long but straightforward. 

Sphere hardening . Let us begin by considering a signal space of N 
dimensions and a set of M = 2 NR " signals, each with energy less than or 
equal to NE^: 

|Sil 2 < NE n ; all i. (5.62) 


When Eq. 5.62 is met, w e say that the signals lie within an N-dimensional 
sphere% of radius V NE H . 

For this proof, we introduce vectors so normalized that the size of the 
constraining sphere is independent of N : 

s ( = s Js/N; i = 0, 1 M - 1, (5.63a) 

n 4 n/VN, (5.63b) 

r = r l^/N = s* + n; m — m it (5.63c) 

f The discrepancy between the bound of Eq. 5.61b and the dimensionality theorem 
of Sect. 5.3, D < 2.4 IK, arises out of a distinction in the conditions of validity of 
the two bounds. The distinction is discussed in App. 5A. 

$ An ^-dimensional sphere, say / 3 , of radius p and centered on the origin, is defined 
as the set of points x = (x lf x 2> . . . , x N ) such that + x 2 2 + • • • + x N - < p“. Thus 
/ s = {x : |x| 2 < p 2 }. The intersection of an A-dimensional sphere of radius p with any 
two-dimensional plane passing through its center is a circle of radius p. Similarly, the 
projection of the iV-dimensional sphere into any set of three dimensions — achieved by 
setting all x t equal to zero except for the three x t pertaining to the three dimensions of 
interest — is a three-dimensional sphere of radius p. For instance, 


I projected In first three dimensions F +*3 ^ P S- 
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where s h n, and r are defined as usual in terms of orthonormal functions. 
We then have 

[ s .|2 

|Sj| 2 = < £ N (joules/dimension), (5.64) 


so that all normalized signals lie within a “signal sphere”, say I s , of radius 
V E h , where £ N is the average energy available per dimension. 

It is interesting to examine the mean-squared length of the ^-dimensional 
noise vector n under this normalization: 


1 1 iV 

i i a 1 i «2 1 ^ 2 

n = — n = — 

N N jti 


(5.65a) 


where {«,-} is a set of N zero-mean, statistically independent random 
variables, each with variance NJ2. Hence 

t N 1 A ' 1 N 

E[|n| 2 ] = j: ln> = ± In? = “ 2 “ = ^ • (5.65b) 

N i N 3-1 N 3-1 2 2 

We see that the average squared length of the normalized noise vector is 

JVy2, independently of the number of dimensions N. 

Although the average squared length of the noise vector is independent 
of N, it is crucial to our proof that the variance of the squared length is not. 
Since the variance of a sum of independent random variables equals the 
sum of the variances, the variance of |n| 2 , say <7 2 (|n| 2 ), is 


(5.65c) 


Equation 5.65c may be evaluated by invoking Eq. 2.145, which states 
that for zero-mean Gaussian random variables {n? 

< = 3(«7) 2 ; 

Thus 

-(|n| 2 ) = il2(^ = f(f) 2 . (5.65d) 

We see that the variance of |nj 2 tends to zero with increasing N. It follows 
from Chebyshev’s inequality that 


N \j~i J / N 3 

= ^l\^7-(n 7?l 


P Ini 2 - 


> A < 


2 m2 f 

N A 2 


for any positive A, no matter how small. 


PROOF OF THE CAPACITY THEOREM 325 

Equation 5.66 states that for large enough N the probability that the 
squared length of the normalized noise vector n differs from its average 
value of J'f 0 /2 by more than A is arbitrarily close to 0. Since the noise 
vector is equally likely to point in any direction, we may picture the noise 
vector when N is very large as falling close to the surface of a sphere of 
radius %/ JnP 0 /2 without directional preference, as shown in Fig. 5.20. 
This phenomenon is referred to as sphere hardening. 


Noise is in this region 
with high probability 
for large ly. 


Figure 5.20 Sphere hardening. 

Insight into the phenomenon of sphere hardening may be obtained by 
calculating the probability that n falls into the shell between concentric 
spheres of radius p — A and p, with A/p « 1. Since the components of 
n are zero-mean Gaussian random variables of variance JV 0 /2 N, the 
density function of the normalized noise vector is 



Pn(“) = 


When A/p is small, 


Kjvyw 


; exp - 


(5.67a) 


P„(«) ** (^r) ex P (“ |rV) ’ for p - A < |a| < p. (5.67b) 
Thus 

M l - -) K |B| < p ] " tfe if 2 exp (" F 0 pS ) [volume of shell] - 

In Appendix 5D we show (Eq. 5D.8) that the volume of the shell is 
proportional to p lV (A/p). Thus the probability that n lies within the shell 
is proportional to the product of two factors, one of which ( e ( ^ 0>p ) 
decreases and the other of which ( P N ) increases sharply with increasing p. 
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For A jp = a constant « 1 we have 

P K 1 ~~ “) < < P ~ P ex P (“ ^r)] • (5.67c) 

The right-hand side of Eq. 5.67c has a single maximum at p 2 = J\P 0 /2, 
as shown in Fig. 5.21. As A becomes large, the consequence is that only 
values of |n| z in the vicinity of this maximum, JV° 0 / 2, have significant 
probability. Figure 5.21 provides a graphic demonstration of the 
“sphere-hardening” phenomenon. 


Figure 5.21 Behavior of P[/> — A < |n| < p] with p. 

Proof of the negative statement. The negative part of the capacity 
theorem states that the probability of error tends to one with increasing 
N if the rate exceeds capacity. The statement is readily proved by means 
of sphere-hardening arguments. We first recall that the receiver for any 
set of transmitter signals {s f } is defined by a set of decision regions. If the 
decision region associated with the n ormalized signal s* is significantly 
smaller than a sphere of radius a/ J\P 0 /2 centered on s,-, the probability that 
(r = s* + n) will fall into this decision region must, by sphere-hardening 
arguments, tend to zero as N increases. Hence decision regions must be 
comparable to or larger than spheres of radius sf N 0 j2. The negative 
statement rests on the observation that if the number of signals is too large 
the typical decision region will be forced by volume (power) constraints 
to have an effective size smaller than that of a sphere of radius sj A° 0 /2- 
To make the proof precise, we first show that sphere-hardening is also 
exhibited by the received signal; that is, with high probability r will fall 
within a sphere of radius \j E N -{- J\P 0 /2 + A. Proceeding as with |n| 2 , 
we have, when s fc is transmitted, 

ill 2 = l§* 4- n | 2 = |s*| 2 + 2s fc . n + |n | 2 
= IsJ 2 + ~ 2>jw«i + lnl 2 , 

Jy 3=i 




(5.68a) 
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where s fc = —r= (s kl , s k2 , . . . , s kN ). Since each noise variable has zero 

fN 


\P 7 N 

E[W S ]= I5/ + v + ^2% E[n,] 

2 N j=i 

- |s fc | 2 + — 0 

2 

< F , 

< £n + — • 


(5.68b) 


Next, denoting the variance of the squared length of r by <r 2 (|r| 2 ) and 
noting that the variance of a sum of two random variables is never greater 
than twice the sum of the variances — a consequence of the inequality 
(i a + b) 2 < 2 (a 2 + b 2 ) — when s k is transmitted we have 

ff 2 (|r| 2 )<2[a 2 (2s fc -n) + < r 2 (|n| 2 )) 
of 4 V 2/ 

= 2 [^l aM + N[T>\ 

= 2 ^fl s « + l(fl 

<2(2^ + 2(^11 . (5.68c) 


The variance of the squared length of the received signal also vanishes 
with increasing N. Hence the received sig nal tends to be close to the 
surface of a sphere of radius V |s fc | 2 + JV 1 0 /2. In particular, 


lr | 2 > £ N + + A 


* 2 (lr! 2 ) 


and tends to zero as N gets large. This tendency of the received vector to 
fall within a sphere of fixed diameter sharply limits the effective volume 
of the decision regions. 

We are now ready to prove the negative capacity theorem. We show 
that for any small positive quantity e the probability of correct decision 
in the transmission of one of M — 2 iV *n equally likely messages is less 
than e.for sufficiently large N whenever i? N > C N = ^ log (1 + 2 E n IN 0 ), 
regardless, of the choice of signal set and receiver. 
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According to the sphere-hardening argument, for large N the received 
vector r is effectively constrained to lie within a sphere, say I r , of radius 
V-^n -j- J\P 0 /2 + A. Writing the probability of correct decision as the 
sum of two terms, 

P[C] = P[C, r in /,] + P[C, r outside I r ], (5.70a) 

we observe that the second term satisfies, for any e > 0, the inequalities 
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region of the same volume centered on the signal. Proof of the optimality 
of spheres follows immediately from the observation that p 5 is a spherically 
symmetric density function decreasing monotonically with increasing |n|. 
As a consequence, any volume element lying farther from s t - than the 
radius p t has less probability of containing r = s f + n than it would if it 
were located within a distance p t from the signal. On the other hand, all 
volume elements in the sphere do lie within distance p* from s e -. 


P[C, r outside I r ] < P[r outside I T ] < ^ . (5.70b) 

The last inequality follows, for large enough N, from Eq. 5.69. 

It remains to be shown that the first term in Eq. 5.70a is also less than 
e/2 for sufficiently large N whenever R N > C N . 

Let f, i = 0, 1, . . . , M — 1, denote that part of the decision region 
for the /th signal lying entirely within I r , as shown in Fig. 5.22; let V i 



denote the volume of f and let V r denote the volume of I r . Because the 
decision regions are disjoint, 

V r = V 0 + V t + ' ‘ + Fjvr-i (5-71) 

and 

M-l 

P[C, r in I r ] = 2 P[l in f \ mj P[mJ 

i —0 

i M-l 

-■^2 Ptrin/.ImJ. (5.72) 

We now observe that, for each i, 

P[r in f | mj < P[|n| < Pi], ( 5 - 73 ) 

where p. t is defined to be the radius of an iV-dimensional sphere of volume 
V i (this sphere and f both have the same volume). Equation 5,73 states 
that no decision region of given volume is better than a spherical decision 



M-l 

(b) 2 v* = V r = MV* 

j = 0 

Figure 5.23 Optimality of equal volume spheres. The volume element dV would 
contribute more to P[C, r in / r ] if it were located closer to a sphere center. 


Substituting Eq. 5.73 in Eq. 5.72, we have 

1 M-l 

P[e,rin/ r ]<^-lP[|n|< P ,]. (5.74) 

Each term on the right-hand side of this inequality is the probability that 
n will lie on or within a sphere of volume V { . The right-hand side is thus 
the arithmetic mean of the probabilities of noise falling in spheres of 
radii p 0 , p lr . . . , p M _ 1} as indicated in Fig. 5.23a. 

If the original decision regions {/ f } all contained the same volume, we 
would have (from Eq. 5.71) 
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and each of the p { would equal the radius, say p*, of the sphere of volume 
VJM. In this case every volume element dV in the set of M spheres 
would lie at a distance no greater than p* from the center of one of the 
spheres, as shown in Fig. 5.236. On the other hand, when the V t — hence 
the pi — are not all equal, some volume elements are more distant than p* 
from a center and therefore contribute less probability to the sum in 
Eq. 5.74. It follows that 

M - 1 

2 P[lfil < ft] < M P[|n| < P*] 

i=0 

and 

P[C, r in I r ] < P[|n| < p*]. (5.75) 

Thus the over-all probability of correct decision for any specified set of 
decision regions is bounded above by the probability that the noise falls 
into a spherical decision region of volume VJM. 

Whenever the number, M, of signals is too l arge, the v olume VJM is 
less than the volume of the sphere of radius v Jf 0 /2 — A; that is, p* < 
VJVy 2 — A. Again invoking the sphere-hardening argument, we have, 
for large enough N, 

P[|n| <p*}< ~ , if p* < yjtf’o/2 - A. (5.76) 
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Since V r is the volume of a sphere of radius V £ N + NJ2 -f A, Eq. 5.77b 
becomes 


B n (E n + XJ2 + A) jV/2 


or 


M 


M > 


< 8* 


-A 


iV/2 


E n + J'Tq/ 2 + A\ w/ * 


Jf 0 /2 - A 

Finally, since M = 2 NR n, Eq. 5.77c may be written 


or 


Clearly, if 


2A , i? N / £ n + N q/ 2 + A V V/a 
\ ^2 - A / 


p . 1 lnr Em + WJ2 + A 
J?N> 2 1 ° g2 J'T 0 /2 — A 


JV\ 


JTJ2 


(5.77c) 


(5.77d) 


= 1 log,(l + 2 M = i log 2 En + y° /2 , (5.78a) 


we can take A sufficiently small that the inequality of Eq. 5.77d, hence of 
Eq. 5.76, is satisfied. Consequently, the condition i? N > C N implies that 


For such large M and N, we can combine Eqs. 5.75 and 5.76 and obtain 
the desired result 

P[C, r in / r ] < | , 
which with Eqs. 5.70 implies 

P[C] < €. 

We now determine the point beyond which the number of signals is too 
large and Eq. 5.76 is valid. In Appendix 5D (Eq. 5D.5) we prove that the 
volume V and radius p of an A-dimensional sphere are related by 

V=B nP n , (5.77a) 


P[C] < € , (5.78b) 

or 

P[£] > 1 - e- (5.78c) 

Equation 5.78a provides an upper bound on the number of message bits 
per dimension which, if exceeded, causes P[8] to be close to 1 for large N. 
We summarize the steps in the foregoing proof by a sequence of equations: 

P[C] = P[C, r in J r ] + P[C, r outside of I T ] 

<77 2P[rm/ i |m i ] + - 
M i = o 2 


where B x is a positive constant that depends only on N. Thus the state- 
ment that p* < V, 1 ^ 0/2 — A or the equivalent statement that VJM is less 
than the volume of a sphere of radius \j Jf 0 /2 — A may be written 


,V T 


No 


— < 5 v j— — A } 
M 1 ~ ' 


yv/2 


< 


1 M-l 

it ' 1 

M i = 0 


P[|n| < Pi ] + 


2 


< P[|n| < p*) + ^ 

< e. 


(5.77b) 
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This sequence is valid if 


which implies 


or 


/J'P 0 /2 — A>p* 

= /J_ vX IN 
\b n m) 


V£ n + jTq/2 + A 
M 1,n 


M>( 1 + 2 — 


Rn > ~ logs 1 + 




The derivation of Eqs. 5.78 is completely independent of the particular 
set of transmitter vectors (sj; the result is valid for any set that meets 
the signal energy constraint of Eq. 5.62. Thus the negative statement of 
the channel capacity theorem is proved. Since transmission with a set of 
2 jV - r n vectors having N components includes the possibility of k trans- 
missions, each with a set of 2 NR " lk vectors having Njk components, the 
theorem holds true for any transmission strategy. We see that the proba- 
bility of communicating a block of RT = NR H bits without any error 
whatsoever must approach zero as T grows large if i? N exceeds C N . 

This proof is described as “sphere packing.” It is a negative proof, in 
that no claim is made about the existence of signals (s £ ) such that the 
decision regions {/J actually are spheres of equal radius. Clearly, 
geometry does not permit. The essence of the argument is simply that the 
“packed spheres” idealization implies a bound on performance that no 
realizable set of signals can surpass. 

Proof of the positive statement . We now prove the positive channel 
capacity statement that if R N < C N then for any positive number e (no 
matter how small) there exists a large enough value of N and a set of 
M = 2 NUh signals (sj such that the attainable probability of error is less 
than e. The proof is complicated by two facts: first, in general it is 
not possible to exhibit explicitly such a set of signals (sj, and, second, 
even if such a set could be exhibited, the calculation of its probability of 
error would be enormously difficult. These complications, which we 
encountered before in connection with R 0 , may again be circumvented by 
considering not just one communication system but rather a whole 
ensemble of systems, each consisting of a transmitter, channel, and opti- 
mum receiver. As before, we construct our ensemble in such a way that 


PROOF OF THE CAPACITY THEOREM 333 


the mean probability of error, P[8], may be easily calculated. We prove 
the theorem by showing that P[S] < e for sufficiently large N; the 
ensemble must then contain individual systems for which the probability 
of error is also less than e. 

Specification of Codes for the Ensemble of Systems. The capacity 
theorem for the Gaussian channel concerns normalized TV-dimensional 
signals {s f } each of which satisfies the average power constraint 

|s,| < i = 0, 1, . . . , M - 1, (5.79a) 

where, as before, 

s * = sJ-fiN. (5.79b) 

Since any vector s* may lie anywhere in the iV-dimensional signal sphere 
f implied by Eq. 5.79a, the codes of an ensemble of systems can be 
specified by stating an appropriate density function over / s , say 

Pili) “ P«o>i (5.80) 

In terms of the density function, the ensemble average probability of 
error is 

Pffl = f p [« I fe} = Y]?(.,)(Y) dr (5.81) 

Since there are M vectors s*, and each comprises N components, y is an 
NM dimensional vector. The multiple integral of Eq. 5.81 is over all NM 
arguments. 

A simple and convenient choice for p (s . } which facilitates calculation of 
P[S] and also satisfies the constraint of Eq. 5.79a is 




for |a| 2 < E n 



P*j(9 0 = j Vs 

; all i. 

(5.82a) 


lo 

for |a| 2 > £ n 


and 






M - 1 



Pi s 

,} = n Psp 

(5.82b) 




where V s denotes the volume of T s . Equations 5.82 state that over the 
ensemble of systems the signal vectors are statistically independent and the 
probability that any signal vector will fall outside of the signal sphere I s is 
zero. Furthermore, if I is a region of volume V entirely contained within 
the signal sphere, 

Pfe-inT] =JpsX a ) da = ~ J da = ff- 


(5.82c) 
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Thus, if we select one system at random from the ensemble for exami- 
nation, the probability that the signal, s f , assigned to the ith message is 
located in a region / of the signal sphere is directly proportional to the 
volume of the region I. 

Although ^ is equally likely to lie in any volume element within the 
signal sphere, it is not true that s 4 is equally likely to lie at any radius p, 
0 < p < V i? N . Indeed, for any A > 0, 


P[|s,| < JEh — A] — P|* in sphere of radius JE n — A] 

volume of sphere of radius fE N — A 

= __ 

= B N (E N - A) W2 = / A_V V/2 
B n E n n ' 2 V eJ 


which is very close to zero for large N. Equation 5.83 is a concomitant 
of the fact that almost all of the volume in a high-dimensional sphere is 
located near the surface. The probability assignment of Eq. 5.82 therefore 
implies that nearly all of the signals {sj have energy close to £ N . 

Calculation of P[Sj. We now show that over this ensemble of communi- 
cation systems P[ 8 ] < e if N is sufficiently large. Note that P[S] depends 
on three statistically independent sets of random variables : 

1. The choice of message, m , with P[m 4 ] — 1 {M. 

2. The noise n, with p„ given by Eq. 5.67a. 

3. The choice of code {sj, with p {§£} given by Eq. 5.82. 


Thus we may calculate P[£] from the conditional probability of error 


3 if m k , n and {s 4 } are such that 
|s fc - (s fc + n))< I* - (s fc + n)| for 
all i 9 ^ k , so that no error is made, (5.84a) 
1 otherwise 

by first multiplying by 

PK3/'A/'s.- , 7 w ( 5 - 84b ) 

then integrating out the continuous variables (s 4 } and n, and finally 
summing over the index k. 

Clearly, the order in which the conditioning random variables are 
integrated out does not affect the value of P[S]. But eliminating n and k 
first amounts (for each code {s £ } in the ensemble) to evaluating P[S | {sjj. 
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This problem — finding the P[£] for a specific code — we have already fore- 
sworn as too difficult. Once more, the crucial advantage of the random- 
coding argument is that it permits us to eliminate the {sj first (to integrate 
over the ensemble of systems first) and thereby to simplify the computa- 
tion. We therefore proceed in the following order: 

i. Eliminate (that is, integrate over) {sj for all i 5 ^ k to obtain 

p [ g | m k> s fc , n]. 

ii. Eliminate s k and n to obtain P[S J mf[. 

iii. Eliminate m k to obtain P[S]. 

It is the fact that step i can be performed by use of a geometric argument 
that permits a simple proof. 

(i) Elimination of (sj, i 5 ^ k. In calculating P[S ] m k , s k , n] the trans- 
mitted vector and the disturbing noise have, by definition, the fixed known 
values Sj. and n. These two vectors together specify a two-dimensional 



Figure 5.24 Plane containing s k , n, r, and origin. 


plane intersecting the A-dimensional signal sphere I s in a circle of radius 
Vi* as shown in Fig. 5.24. All points in the signal space that are closer 
than ^ to the received vector r = s fc 4- n are located in a “noise” sphere 
around r of radius |n|. Since we have already stipulated that each receiver 
in the ensemble of communication systems is optimum, the presence of 
one or more of the other signals in this noise sphere will cause an error. 
All signals, however, are confined to I s . Thus the intersection of f and 
the noise sphere centered on r forms the locus of all allowable trans- 
mitter signals that cause an error, given s fc transmitted and r = s fe + n 
received. 

This locus is an JV-dimensional solid whose projection onto the plane of 
Sj. and n is the crosshatched region shown in Fig. 5.25a. Furthermore, 
this JV-dimensional solid has an identical lens-shaped cross section when 



336 EFFICIENT SIGNALING FOR MESSAGE SEQUENCES 

projected onto any plane containing r and the origin. Consequently, this 
solid, to which we shall henceforth refer as an JV-dimensional lens, is 
completely contained within an //-dimensional sphere of radius h 
centered on the point O'. This follows from the fact that the cross 
section of this sphere, when projected on any plane containing r, is a circle 
of radius h centered on O'. Thus the volume of the lens, K lens , is bounded by 

Viens < B N h N , (5.85) 

a fact to which we shall return later. It is helpful when visualizing the 
geometrical relationships just described to consider the case in which the 
spheres are three-dimensional (N — 3), illustrated in Fig. 5.256. 



Figure 5.25 a Projection of locus of signals causing error. 


Given s*. and n, an error occurs in any system in the ensemble whose 
code includes at least one other signal vector within the lens. From 
Eq. 5.82c the probability of the set of systems whose ith signal vector, s fJ , 
lies in the lens is 

P[s* in lens] = ^2S ; i ^ k. 

' 3 

There are (M — 1) nontransmitted vectors in each code. Since the 
probability of a union of events is bounded above by the sum of their 
probabilities, it follows that the probability of the set of systems having 
one or more nontransmitted signal vectors in the lens is 



I 


i k. 
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Thus we have 

P[S|m fc ,s fc ,n]<M^. (5-86a) 

'S 

It is clear from Fig. 5.25a that the volume K lens depends on s* and n. For 
certain choices of these vectors M(V le JV s ) will exceed unity, in which 
case tighter results can be obtained by invoking the trivial bound 

P[£ | w*. 5fc.fi] < 1- (5,86b) 

Noise sphere, 



(b) 

Figure 5.25 b The intersection of two spheres is a lens. The enlargement shows the lens 
enclosed within a third sphere, of radius h, centered on O . 


(ii) Elimination of s fc and n. We are now prepared to undertake the 
second step of the proof, namely, to eliminate the continuous variables s k 
and n from the conditional probability of error bounded in Eqs. 5.86: 

P[8 | m k ] = jp[8 | m k , s* = a, n = p] p Sjt ( a) Pn(P) da d$, 

where the integral is taken over all possible values of s fc and n. In order to 
apply the bounds of Eqs. 5.86a and b, we perform the integration in two 
parts. In the first an integral is taken over a domain © of values of s k and 
n for which all pairs (s fc , n) are such that V lens is less than a (smaU) con- 
stant, say Fj* ns . The second integral is over the remaining domain, © , 
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of values of s k and n. The probability of 2> c is later shown to be suffi- 
ciently small that the bound of Eq. 5.86b may be used safely. Thus 

P[£ | m k ] < J [m psja) p s (p) da dp + J %(«) p„(P) da d£ 

© ' ® c 

= M-^P[tO] + P[3) c ] 

< M + P[3> c ]. (5.87) 

In the evaluation we have defined PpD] and P[2) c ] as the integrals over 2) 
and 3) c respectively. In the last step we have used the fact that any 
probability is overbounded by unity. 

We define the domain 2) c to be those pairs (s fc , n) that satisfy at least one 
of the following conditions : 

1. I^l 2 < £ N - A 

2. |n| 2 > — + A 

2 

3. Ill 2 < l&l 2 + f - A. 

The probability of 3> c is the probability of the union of these events 
and is bounded by the sum of their probabilities : 

P[» c ] < Pfel 2 < B n - A] + p[lBl 2 > y + A 

+ p[|iI 2 < IsJ 2 + ~ — A . 

By Eqs. 5.83, 5.66, and 5.68 each of the three terms on the right is less than 
e/6 for sufficiently large N , so thatf 

P[2) c ] < - . (5.88) 

2 

We now show that' the first term on the right-hand side of Eq. 5.87 is 
also overbounded by e/2 for large N. For all pairs (s fc , n) in 2) we have, 

f To be precise, Eqs. 5.68b and c together with the Chebyshev inequality imply that the 
conditional probability of the event |r| 2 < |a| z + JV’ 0 /2 — A, given s* — a, approaches 
zero as N becomes large. But for any event A such that P[/l | s* = a] < e/6 for all 
allowable a we also have Pf4] < e/6. 
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from the defining conditions, 

|n| < n/J^o/2 + A 

and (5.89) 

|r| > V^ N + J^o/2 ~ 2A. 

From Fig. 5.25a it is clear that the size of the lens increases with decreasing 
|r| and with increasing |n| . Hence the largest lens for pairs (s fc , n) in 9) 
is achieved when the conditions of Eqs. 5.89 are satisfied with the equality, 
as shown in Fig. 5.26; from Eq. 5.85 we have 

Ffens < B N (h*) N , (5.90a) 

where h* is defined in the figure. 

To show that F,* ng is small enough so that M V^J V s is less than e/2, 
we require a bound on h*. It is clear from Fig. 5.26 that h* is a continuous 



Figure 5.26 The maximum value of h for (s*, n) in 0) is h*. 

function of A for A near zero. In particular, if we let h° denote the value 
of h* for A = 0, we may write 

h* = h° + 5, (5.90b) 

where d is positive and may be made arbitrarily small by taking A small. 
It is an easy matter to compute h°, since s fc , n, and r form a right triangle 
when A = 0, as shown in Fig. 5.27. Calculating the area of this triangle 
first with r as the base and then with s, c as the base, we have 




hence 


h' 


(5.90c) 
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Figure 5.27 Geometry when A = 0. 


Since V s is the volume of the signal sphere I s , which has radius %/ £ N , 
and, since M = 2 NR ", the desired bound on MV 1 * n JV s is 



(5.91) 

(5.92) 


A, hence d, can be taken small enough so that the term within square 
brackets in Eq. 5.91 is less than 1. Equation 5.92 is equivalent to the 
statement that i? N < C N = Jlog 2 (l + 2£ N /J'P 0 ). Thus for sufficiently 
large N 

if Kn < Cn 

K 2 

and 

P[g|m fc ] < M ^SS. + P[3) c ] < e. 

K 
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(iii) Elimination of m k . The final step in the proof of the positive 
channel capacity statement is to sum over the index k. 

M-l 

P[S] = 2 P[6 | mj P[mJ 

fe = 0 

M-l 

< 1 c Pint*] = e- (5.93) 

fc =0 

This completes the proof. 

Discussion 

The concept of channel capacity is the fundament of modern com- 
munication theory. Before Shannon’s work communication engineers 
believed that noise in the channel set an inescapable limit on the accuracy 
of communication of a fixed-rate source. The capacity theorem states that 
noise (together with the available number of dimensions per second and 
the available signal power) sets an inescapable limit only on the rate at 
which accurate communication can be achieved, but hot on the accuracy. 

This theorem, proved here for an additive white Gaussian noise channel, 
is actually of vast applicability. It holds true for very general mathematical 
channel models. More important, every physical communication channel 
also exhibits phenomena that are consistent with the concept of an input 
bit rate that cannot be exceeded if communication accuracy is to be pre- 
served. In addition, this rate is usually significantly greater than that 
at which reliable system operation can be achieved by conventional means 
such as bit-by-bit signaling. 

To a considerable extent, research in communication theory is con- 
cerned with finding practical means of simultaneously attaining the higher 
accuracy and higher data rates predicted by channel capacity. Some of 
the problems inherent in trying to do so are considered in Chapter 6. 

5.6 RELIABILITY FUNCTIONS 

In Section 5.4 the ensemble average probability of error for com- 
munication over an additive white Gaussian noise channel was shown to 

satisfy the bound 

P[S] < 2~™" Rn1 , (5.94) 

where N is the code block length, and i? N is the rate in bits per dimension. 
Equation 5.94 was arrived at by means of the simple argument that the 
probability of a union is no greater than the sum of the probabilities of its 
constituents. 
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The bound of Eq. 5.94 indicates that arbitrarily small error probabilities 
can certainly be achieved if R H < jR 0 . More elaborate bounding tech- 
niques were used in Section 5.5 to prove the channel capacity statement 
that arbitrarily small error probabilities can be achieved if and only if 
J?n < C N . The channel capacity statement is stronger, in the sense that 
C N > R q . In another sense, however, Eq. 5.94 is stronger, in that know- 
ledge of i? 0 enables the bounding of the error probability as a function of 
N and 7? N , whereas knowledge of C N alone does not. 

More complete knowledge of the achievable error performance than 
that provided by either C N or 7? 0 is embodied in a function called the 
channel reliability function. We now derive the reliability function for the 
infinite-bandwidth white Gaussian noise channel. The procedure is to 
obtain a bound on the attainable probability of error for block orthogonal 
signals which is tighter than that afforded by the union argument alone. 

Block-Orthogonal Signaling 

When one of a set of M — 2 RT equally likely orthogonal signals of 
energy E s = P S T is transmitted in white Gaussian noise of power density 
JVy2, the union bound of Eq. 5.14a may be rewritten as 

P[S] < 2 (5.95a) 
in which we have introduced the definition 

log, « = lim DC n = lim S. loJl + . (5.95b) 

d-<o •»-*<» 2 \ JVqD/ 

C OT is the limiting value of the white Gaussian noise channel capacity (in 
bits per second) of Eq. 5.59 as the available dimensions per second, D, 
hence the channel bandwidth, tends to infinity while P s and J\P 0 remain 
fixed. 

From Eq. 4.96, the exact expression for the probability bounded in 
Eqs. 5.95 is P[£] = 1 - P[C], in which 

P[C] = f p n (fi- y/E,)dp f P„(v)dv , (5.96a) 

J — CO L J — CO 

= (5.96b) 

s/ 7 T ’l\ 0 

We now overbound 1 — P[C] by arguments similar to, but more sensitive 
than, the union bound. As a preliminary step we normalize Eq. 5.96 by 
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making the change of variables a = pj\j Jf 0 /2, = vj\J Jf 0 j2, so that 


in which 


P[C]= p{*-b)da\\ p(P)d(} 


p(y)--Le^» 

yJ2-7T 


(5.97a) 


(5.97b) 


b ± f2 EJJP 0 . (5.97c) 

Thus the probability of error, expressed in terms of the Q( ) function of 
Eq. 2.50, is 

P[«] = 1 - P£C] = p(« - b) d»{l - [1 - 2(a)]®- 1 }. (5.97d) 

The term in braces is the probability that at least one of M — 1 (inde- 
pendent) noise components exceeds a; by the union argument it is 
bounded above by the sum of the probabilities that individual com- 
ponents exceed a: 

{!_[!_ Q( a)]*-*} < (M - 1) <2(a) < M G(a). (5.98a) 

Since it is a probability, it is also overbounded by one : 


(1 - [1 - CO*)]"" 1 } < 1. 


(5.98b) 


The unity bound is tighter when a is small and Q{ a) large; the bound 
MQ(<y.) is tighter when a is large. It is therefore convenient to split the 
range of integration on a. into two parts, thereby obtaining 


P[S]< 


j p ( a — b) dx + M I 

J — CO Jo 


b) dv. + M p { a — b)e a /2 da; 0 < a. 


Here we have also invoked Eq. 2.122 and the condition a > 0 to overbound 
<2(a) by Denoting the first integral by P t and the second by P 2 , 

we have 

P[8] < P x + MP 2 . (5.99) 

The bound is minimized by choosing a as the solution to 


whence 


0 = — [Pi + MP 2 ] = p(a - b) - Mp(a - b)e~ a /2 , 
da 

e a ~>- = M. 


(5.100) 
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The next step is to bound P 1 and P 2 . We have 

P x = P ~L e- {a - b) * 12 dec = f " 6 -L e“»*' a dy = Q(b - a). 
J— to \J2tt J- oo y 2tt 


P 1 < e ~ {a ~ b)2/2 ; a <6. 


(5.101) 


P _ 4r“ MV) <l« 


= e - 6 2 /4f“^ e -( a - & /2) 2 i/cc 


_ />— _L 


-L _ '^'dy 

V 2 " Vila— 6/2) V 2?T 


P’ < A h 

e -(6 z /4)-<a-l>/2) a . fl > 


(5.102) 


(In Eq. 5.102 it is important to note the values of a for which the in- 
equalities are valid.) Substituting Eqs. 5.102 and 5.101 in Eq. 5.99 and 
replacing M with e ,|2/2 in accordance with Eq. 5.100 yields 


P[S]< 


+ e- 'V 


0<a<-, 


e -(a-b) 2 / 2 _j_ e « 2 /2 e -(& 2 /4)~(«-6/2) 2 . - ^ a <b. 


(5.103) 


The final step in bounding P[€] is to simplify Eq. 5.103. Since 

the second term in the bound on P[8] for 0 < a ^ 6/2 is larger than the 
first. For 6/2 < a < b, the exponents of the two terms are the same. Thus 


P[S]< 


2e _(^/ 4)+ (« z / 2 ). o 

2e -(«-M*/2 - < a < 6. 

2 


(5.104) 
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Equation 5.104 can be written in terms of the original parameters of the 
communication problem by substituting 

a = V2 In M = fRT ( 2 In 2), 

* = V2£,/^„ = J2(TPJX 0 ) = VrC„(21n2). (5 ’ I05) 

We then have 

P[£]<2-2- T£ * (r ', (5.106a) 


in which 


£*(R) = 


- R; 


0 < R < |C„, 


(5.106b) 


1(7^00 “ V*) a ; iC ro <R<C co . 

Equation 5.106 is the desired result: the exponential factor E*(R ) is 
the channel reliability function. A normalized plot of E*(R) is shown in 



Fig. 5.28. We note that E*(R ) coincides with the exponent of the union 
bound for R < CJ 4, but yields a tighter result for CJ4 < R < C m . 
The fact that E*(R) = 0 for R — C m reflects the channel capacity 
constraint. 

It is possible to show 32 ’ 93 that the foregoing bound is exponentially 
tight-, this means that for no rate R can a number greater than E*(R) be 
substituted in Eq. 5.106a without invalidating the inequality for large 
values of T. The equivalent mathematical statement is 

P[8J > Be~ TE ^ R \ (5.106c) 

in which the coefficient B decreases only slowly (nonexponentially) as a 
function of T. Although stated for orthogonal signals, Eq. 5.106c pertains 
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also to any other signal set. This follows from the fact that orthogonal 
signals provide substantially the same error performance as optimum 
(simplex) signals when M is large. Thus the reliability function of Eq. 
5.106 b is both an upper and a lower bound on the exponential behavior of the 
error probability attainable with the infinite bandwidth additive white 
Gaussian noise channel. 

Other Channels 

Bounds analogous to Eqs. 5.106a and c can be derived for an extremely 
broad class of realistic communication channel models. 8,27 ' S2 In particular, 
random coding arguments somewhat more elaborate than those we have 
encountered here can be used to evaluate a reliability function and write a 
tight upper bound in the form 

PM < A2~ te ' (R) . (5.107a) 

The bound is on the mean error probability over an ensemble of com- 
munication systems, each of which uses a different code. In terms of the. 
dimensions per codeword, N, and the dimensions per second, D, afforded 
by the channel, the bound is 

TO < A2~ ne(r "\ (5.107b) 

where 

£(R n ) = j E'(DR n ). (5.107c) 

For these channels tight lower bounds to the error probability can also 
be derived and written in the form 

P[8] > A2~ NE(r n\ (5.108) 

Combining Eqs. 5.107b and 5.108, we have 

A2~ NE{R " ) < P[8] < A2~ NE(r (5.109) 

Since the the coefficients A and A both can be shown to vary only slowly 
with N, the reliability functions £(R N ) and E(R N ), evaluated for a par- 
ticular channel, represent upper and lower bounds on the exponential 
behavior of the probability of error attainable when communicating over 
that channel. In writing Eq. 5.109, we have used the fact that not all of 
the systems in an ensemble can yield a P[8] > P[8], so that at least one 
code exists for which the upper bound of Eq. 5.109 is valid. 

The generic form of the functions E(R H ) and E(R N ) is illustrated in 
Fig. 5.29. Of course, for different channels the values of the parameters 
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C N , R 0 > 7? c , R e , and E(Q) are different; the shape of the curves shown, 
however, is extremely general.f In particular, the equality 

E(R n ) = E(R n ); R c < R n < C N , (5.110) 

where R c is called the critical rate, is always true. Thus the exponential 
behavior of the attainable P[8] is precisely determined for rates near 
channel capacity. 

Equation 5.110 is a remarkable result: E(R n ) relates to the average 
error behavior over an ensemble of all //-symbol codes of rate R N , 



whereas S(R N ) relates to the best conceivable error behavior. Recalling 
that the probability of the set of systems (codes) for which P[S] > a P[8] 
cannot exceed 1/a, we see that Eq. 5.110 implies that a preponderance of 
the codes in the ensemble are exponentially optimum for rates greater than 
critical. 

f In certain instances the curves may exhibit degeneracies. As an example, for the 
infinite bandwidth white Gaussian noise channel £(0) = R 0 and £(i? N ) = £(£ N ) for 
all i?N, 
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The union bound, 

P[8] < (5.111) 

is indicated in Fig. 5.29 by the dashed line. The ensemble exponent 
£(i? N ) always coincides with the union exponent for R e R c , but 

in general 

£(* n ) > i? 0 - * N ; (5.112) 

The improvement at low rates is obtained by expurgating the ensemble to 
eliminate those systems in which the error probability is dominated by 
poor codeword selection rather than by the effects of channel disturbance. 
The parameter R e is called the expurgation rate. Unfortunately, practical 
procedures for actually carrying out the expurgation procedure and 
attaining the expurgation exponent have not yet been devised. 

The curves E(R N ) and E(R U ) for a specific channel embody detailed 
knowledge of the attainable error performance. Although less detailed, 
the knowledge conveyed just by the value of R 0 is also exceedingly in- 
formative. In particular, Fig. 5.29 illustrates that the union bound of 
Eq. 5.111 is exponentially equivalent to the lower bound of Eq. 5.108 for 
i? N » R r . Thus the value of R 0 provides an accurate characterization of 
the exponential error behavior attainable at rates near critical. 

The advantage in simplicity to be gained from using a single-parameter 
descriptor is obvious, and in our study of the implementation of coding 
in the next chapter we focus attention primarily on 2? 0 . 


appendix 5A bandwidth-constrained orthonormal 
FUNCTIONS 

In this appendix we consider certain implications of two theorems, 03 
one due to Landau and Poliak and the other to Shannon. These 
theorems concern any function, say f(t), that satisfies the following 
conditions : 

(a) f(t) is identically zero outside the interval [-T/2, T/2]. 
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The theorems state that, there exists a particular set of orthonormal 
functionsf {T, •(/)}, i — 0, 1,2,..., with the following properties. Each 
T 4 (0 is identically zero for |/| > Tj 2, and for every fit) satisfying (a), 
(b), and (c) 

rm r l-i -i2 

dt < e 2 , (5A.i) 

J — co L j'=0 


/i=r nt)%(t)dt 

J— CO 


(i) Landau and Poliak 

(ii) Shannon 


L — largest integer < 2 TW 4- 1 
= UVw 2 


(5A.2) 


(5A.3) 


L= largest integer < 2TW + + - 2 In 2Tw\ 

A \ IT / 


€ 2 _ ^VW 2 
(12 - A) 


for all A, 0 < A < 12. (5 A .4) 


5A.1 Constrained Linear Combinations 

Suppose that {(p s {t)} is a set of N orthonormal functions such that 
every unit-energy linear combination of them, for example, 


g(t)=2gi<Pi(t); I>, 2 = 1, 

j-i j-i 


(5A.5) 


satisfies conditions (a), (b), and (c) with ?? JF 2 = A- (In particular, each 
of the (pjit) satisfies the conditions.) Then the theorem of Landau and 
Poliak may be used to show that the number of functions, N, in the set 
{<Pj(t)} is constrained by 

N < L < 2TW+ 1, 

hence 


lim — ^ 21Tdimensions/sec. 
T-><x> T 


(5A.6) 


t The orthonormal functions {¥*,(/)} are related to the prolate spheroidal wave functions 
with parameters T and W, say { ?/;,(;)}, by 

= 1,1 <Z 2 


0; otherwise. 


The {A,} are normalization constants. 
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Proof is by contradiction. Suppose that rj w 2 — A and N > L. Then 
a linear combination of the {^(0} exists 10 which is orthogonal to each of 
L functions £¥&)}. Let g(t) be this linear combination, normalized to 
unit energy. Then 

g . = p g(0 x F i (/) dt = 0; i = 0, 1, . . . , L- 1, (5A.7) 

J—co 

and 

f” U‘) - <“ = f“ s\t) it = 1. (5A.8) 

J-co L J=0 J 

On the other hand, for rjw 2 = 12 the Landau and Poliak result requires 

f " [g(0 - W)T dt < W = L (5 A .9) 

J-CoL 2=0 J 

which contradicts Eq. 5 A. 8, hence contradicts the hypothesis N > L. 


5.A2 Constrained Orthonormal Functions 

We now use the theorem of Shannon to bound the number of T - sec 
duration orthonormal functions {g> # (/)} that satisfy conditions somewhat 
weaker than those of Section 5A.1 ; instead of requiring that every linear 
combination of the {cp } (t)} meet the bandwidth constraint of condition 
(c), we now require only that each meet this condition individually. 
Thus each = 1, 2, .... N, is required to have at most a fraction 

7 ] w 2 of its energy outside the band [-W, W], although it is possible that 
some linear combination of the (<ft(0) h ave more - this case we shall 
obtain the (weaker) bound 


< 2W 


12 - A 

12(1 — ij w z ) — A 


2TWA 


1 + In 2TW 


for all A, 0 < A < 12, (5A.10a) 


N ^ 2W 
lim ^ j n • 

t-to T 1 — rj w 


(5A.10b) 


Although both bounds in Eqs. 5A.10 are always greater than 2 W, they 
exceed 2fV by very little when 2TW is large and r) w 2 is small. For 
2TW > 100 and rj w 2 = A Eq- 5A.10a states that N/2TW < 1.2. 

The first step in proving Eq. 5 A. 10a is to note from Eq. 5A.1 that 

P U I^TXOI* dt < e 2 ; 7 = 1,2,..., N, (5A.11) 

J — 03 L 2=0 


fl.. = <p£t)VMdt. 


(5A.12) 
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But the left-hand side of Eq. 5A.11 is 

feo L — 1 /’co L—l /’oo 

<Pi \ 0 dt ~ 2 1 a u <Pj ( 0 %(t) dt + 2 %*(t) dt 

J — co i=0 J—co i=0 J—co 

= 1 - 2 2>„ a + l\/ = 1 - (5A.13) 

2=0 2=0 2=0 

Substituting Eq. 5A.13 in Eq. 5A.11 and summing over j, we have 

X fl -laA = N - 


(5A.14) 


l(i^) > m - e z ). 

2=0 \j=l / 


The next step is to note that the {'F 1 -(t)} may be expanded in terms of 
the { 99 , -(0) > n accordance with the Gram-Schmidt procedure of Appendix 
4A: 

j=i 

where 0 t (t) represents the part of x ¥ i (t) that is orthogonal to all of the 
{<P*(0}- Thus 

i=rx\t)d t =ia^ + roAt)dt 

J—co 3=1 J—co 


N 

1 >2 a a 2 - 

3=1 

Substitution of Eq. 5A.15 in Eq. 5A.14 yields 

-JL> AT(1 -«•) 

i=0 


(5A.15) 


(5A.16) 


Substituting the values of L and e 2 from Eq. 5A.4 yields Eq. 5A.10. 


5.A3 Discussion 

We conclude that the number of orthonormal functions with energy 
concentrated in [— W, W) increases linearly with T, at best. The pro- 
portionality constant of the bound is linear in W and equals 2 W when all 
linear combinations must also have energy concentrated in [-W, W ]. 
The proportionality constant of the bound is slightly larger than 2 IE if 
only the orthonormal functions themselves must satisfy the energy 
concentration condition. 
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APPENDIX 5B BANDLIMITED WAVEFORMS 


Consider an ideal bandlimited waveform such as 


g(t ) = r c(fw un df, 

J-w 


G(f) = " g{ ty tbr,t dt. (5B.2) 

*} — CO 

In this appendix we show that ifg(0 is identically zero over any interval 
a < t < b of nonzero length, then g(t) is identically zero for all t. 

I - 


a h b 


Figure 5B.1 A function g(t) identically zero in [a, b]. 

Preliminary insight is gained from an apparent inconsistency. Assume 
g(t) = 0; a<t<b (5B.3) 


and define 


(0; \t - h\ < A 

) 1 ; elsewhere, 


where A and t x are chosen so that f x + A and t x — A both fall within the 
interval [a, 6]. Then, as shown in Fig. 5B.1, 

g(0 =•?(') «»• ( 5B - 5 ) 

Taking the Fourier transform of both sides of Eq. 5B.5 yields 

G(f) = G(f) • H(f). (5B.6) 

But 

W )| = i(/) + 2A|? i ^ - (5B.7) 

It is evident in Fig. 5B.2 that convolving a bandlimited spectrum Gif) 
with H(f) yields a spectrum that is not bandlimited. We infer that Eqs. 
5B.1, 5B.3, and 5B.6 cannot all be valid simultaneously unless G(/) = 0. 
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Figure 5B.2 Spectra of g(t ) and h(t). 

A formal proof that provides additional insight involves the power 
series expansion of git). From Eq. 5B.1, the kth derivative of g{t) is 


rw 

g {k \t ) = (j2 TTffGify^df. (5B.8) 

J-w ; 

Thus 

rw rw j 

\g™(t)\ < (2nWf\ \G{f)\df<i2-nWf\ [1 + |G(/)| 2 ] df j 

J-w J-w 

or, if E 0 is the energy of g(t), j 

|^(0I < (2 irWfiE, + 2W). (5B.9) | 

The fc-term power-series expansion of g(t) around the point t x is 

m = g«i) + g m (h)d - 1,) + a - hf 

+ " • + f: ^ (f - O'” 1 + «t. (5B.10) 

(l< ~ 1)1 

in which the remainder term R k is given by 

R t = (± =r^g il \r) (5B.11) 

with r some number between t and t x . Thus 

\R k \ < lt ~ ^ (2nW)\E a + 2 W). (5B.12) j 

kl 

Precise knowledge of g(t) over any interval of nonzero length permits 
the calculation of every derivative of g(f) at the midpoint of this interval 
and thus the construction of the power series. Moreover, the bound of 1 

Eq. 5B.12, hence goes to zero as A: — >• co for every t, so that the 
infinite power series is everywhere absolutely convergent and represents f 

the function g(t) completely. It follows that if g(t) = 0 over any interval, f 

it is identically zero everywhere. j 
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APPENDIX 5C OPTIMIZATION OF R 0 


Given any transmitter alphabet {<Zj} and associated probabilities {pi}, 
l — 1,2 A, the random coding union bound of Eq. 5.55 for the 
additive white Gaussian noise channel states that 


Ppi < (5C.1) 

where 

Ro=-log 2 | tpibinPn (5C.2) 

and 

b lh A = bhh (5C.3) 

In Eq. 5C.1, is the average probability of error for codes of length 
N over the ensemble of communication systems in which the probability 
that any signal component s {j is assigned letter a L is p t , independent of 
all other component assignments. 

We desire to find the {p,} for which R 0 is maximum, subject to the 
constraints 

Pl >0-, l = 1,2, A (5C.4a) 

and 

1. (5C.4b) 


A 

2 Pi = !• 


The exponent R 0 may be maximized by minimizing the double summation 
in Eq. 5C.2. Let 22 be a Lagrange multiplier. Then 

A[ I ipAnP,, - 22 i J = 2 \ib lh p„ - 2 ] ; 

op, L{=ift-i t=i J Lj ( =i j 

l =1,2,..., A. (5C.5) 

Setting each partial derivative equal to zero yields the set of A inhomo- 
geneous linear equations 

ib lhPh ~l\ l =1,2,..., A. (5C.6) 

ft =i A 

The value of 2 is determined from the constraint 2 Pk~ *• 

h~l 

Whenever the { p t } that solve Eq. 5C.6 are all non-negative, these { Pl } 
maximize R 0 . We then have 


A A A 

2 2 Pl^lhPh = * 2 Pi = 1 

1 = 1 h —1 1=1 


(5C.7) 


= -log 2 2; all p j > 0. 


(5C.8) 
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If the b lh all sum to the same number, 

2 b ih ==b > 1=1,2, ..., A, (5C.9a) 

fc= 1 

the solution to Eq. 5C.6 assigns equal probability to each letter of the 
alphabet. This is immediately apparent from the fact that Eq. 5C.6 then 
becomes 

iKP> = ±2K = 7 = 2; '=1.2 A. (5C.9b) 

A=1 A 1 A 

In this case 

Rq max = — logs”. (5C.9c) 

’ A 

When some of the {p t } that solve Eq. 5C.6 are negative, the Lagrange 
solution is not a valid probability assignment. The implication is that 
some of the {p^} should be set to zero, which means that there are too 
many letters in the transmitter alphabet {a,}. The engineering solution is 
to reduce the number of letters, so that they may be spread farther apart 
without violating the energy constraint of Fig. 5.14. 

APPENDIX SD THE VOLUME OF AN TV-DIMENSIONAL SPHERE 

An TV-dimensional sphere of radius p is defined to be the locus of all 
points 

a = (a l5 a 2 , . . . , a*,) (5D.1) 

such that 

M 2 =f «,*</>*• (5D.2) 

i— 1 

Thus the volume of an TV-dimensional sphere of radius p is 

v(p)= J da 

= [[••'/ d *i d*2. ' • ■ da N . (5D.3) 

2 , 2 . , 2^-2 
a l + a 2 

Making the changes of variable 

ft - ^ ; j = 1, 2, . . . , TV, (5D.4) 

P 
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we have 


V(p) = P N /{•••/ WidPi-'-dpt 


Pi +Pz+- ■ -+P-V <1 ■ 

= P N V( 1) £ B nP n . (5D.5) 

We still have to determine the volume, B N , of an A-dimensional sphere 
of unit radius. We begin indirectly by considering a set n = (« l5 n 2 , . . . , 
n N ) of N zero-mean, unit-variance, statistically independent Gaussian 
random variables, with probability density function 


^ (a)= (^ e_ 


(5D.6) 


Now consider the probability that n will lie in the thin spherical shell 
contained between concentric spheres of radii p and p — A. If A is small, 
|nj is very nearly constant for all n in the shell. This probability is therefore 
very nearly equal to the volume of the shell times the value of p„(a) when 
|aj = P : 

P[( P - A) < |n| < p] « -4^ e-^\V( P ) - V( P - A)]. (5D.7) 
But 

F(p) - V(p - A) = B lV [(p)- v - (p - A)' v ] 

= b n [n p n -' a - p»-‘ A a + • • •] 


KfNBxp*- 1 A; -« 


N - 1 


(5D.8) 


Therefore 


e-'^Np"- 1 A. 


P[( P - A) < | n | < p] « e“' "Np"- 1 A. (5D.9) 

We can also write P[(/> — A) < |n| < p\ in terms of the density function 
of the random variable |n|: 

P[0 - A) < [n| < p] ™ p H (p) A. 

Substituting back in Eq. 5D.9, canceling the A’s, and noting that in the 
limit as A tends to zero the approximations become exact, we have 


P|nI P (2-77 : ) A ' /2 6 


(5D.10a) 


The constant B N may now be evaluated by use of the fact that the area 
under any probability density is unity: 

1 -/>■>« ^ - idp- r s ^ (5D - iob) 
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There are two cases: when N is odd, N — 1 is even; the integral is 
therefore one half the (N — l)st moment of the unit- variance Gaussian 
density function. From Eqs. 5D.10b and 2.145, 

_ 2 n (7t) [N ~ 1),2 [(N — l)/2]\ odd 

iV N\ 

When N is even, iV — 1 is odd. Making the change of variable ft — p 2 / 2, 
we have 


C co n N - 1 . oUV-2)/2 

Jo J2tt J2tt Jo 


Jo sJ2tt 2i 

Repeated integration by parts yields 


(5D.12) 


AN- 2)12-1) 


N - 2 


(5D.13) 


Substitution of Eqs. 5D.13 and 5D.12 in Eq. 5D.10b leads to 


]V2 (iv-2>/a 


(277) W2 

■2>/ 2 | jV - 2 


N even. (5D.14) 


As a check, we note that B 2 = tt, B 3 = §tt. 

It may be verified from Stirling’s approximation to the factorial that 


^ 7277 /lV; N large. 

Bn-i 

Indeed, from Eq. 5D.14, we have immediately 


(5Ei.l5) 


B ; v Rjv_i 277 

Bn - 1 Bn - 2 N 


all even N. 


(5D.16) 


PROBLEMS 

5.1a. A communication system has an input buffer for storing messages before 
transmission. The buffer contains 10 4 magnetic cores, each of which has two 
distinct flux states. The message source specifies one of 1024 messages each 
second. How many seconds of source output can be stored in the buffer? 

b. Assume that each (binary) core in (a) costs one dollar, installed. At what 
price per installed core would a buffer using multistate cores, each with eight 
distinct flux states, be competitive? 
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5,2a. Equation 5.15a provides a useful bound on the error probability with 
orthogonal signaling when E„ > N 0 2 In 2 = E mi Use the bound to estimate 
how large the number of messages M must be to guarantee P[£] < 10“*. when 

f(i) Idb 
10Io glo -A = GO 3db 

c min j 

V(iii) 6 db. 

b. Assume that the communication system is connected to a source that pro- 
duces one bit every 10 msec. Determine (for i, ii, and iii) how many seconds of 
source output must be buffered at the transmitter. Also determine the channel 
bandwidth, W, required when the number of orthogonal signals, Z>, that can be • 
transmitted per second equals %W. 

5.3 Equation 5.14 bounds the attainable P[S] for M equally likely orthogonal 
signals and an additive white Gaussian noise channel in terms of the signaling 
interval T, the available transmitter power P s , and the information rate R (in bits 
per second). Derive similar bounds for (a) M simplex signals, (b) Mbiorthogonal 
signals. Discuss the relative advantages of the three signaling systems from an 
engineering point of view. 

5.4 Consider the Gaussian pulse x(t) = (V2^ a)~ l and signals such as 

sit) = X % cc(t - jr); i = 0, 1, . . . , M - 1 

i=i 

constructed from successive r-sec translates of x(t). Constrain the interpulse 
interference by requiring 


x(t — It) x(t — jr) dt < 0 




x\t) dt; all j and / ^ j, 


and constrain the signal bandwidth W by requiring that x(t) have no more than 
10% of its energy outside the frequency interval [-) 'V, W]. Determine the 
largest permissible value of the coefficient k in the equation N «= kTW when 
JV » 1. 

5.5 Consider a set of A orthonormal waveforms {<p k (,t)}, k - 1, 2, . . . , A, 
each of which is identically zero outside the time interval [-• r, 0]. These wave- 
forms are used to construct signals U/r)} of the form 

s(t) = Vp s T[<p ki (t - t) + <p,. 2 (t — 2r) + • • • + <P k ft - Jr)], ' 
in which the { k } } are integers between 1 and A. Thus each signal in the set 
{>//)} is specified by a vector of the form (& l5 k z , . . . , kj). 
a. Assume A — 4, J = 5, and 


<Pk(0 = 


V 2/t sin 2tt— t; — t < t < 0, 


elsewhere. 


Sketch the signal specified by the vector (2, 1, 4, 2, 3>. 


problems 359 


b. Consider the set of all distinct waveforms in the form of s(t). How many 
waveforms are there in this set for arbitrary A and /? Are all of these waveforms 
mutually orthogonal? 

c. Consider the ensemble comprising all waveforms of (b), to each of which is 
assigned equal probability. Pick two waveforms, independently at random, 
from this ensemble. What is the probability that they differ in h of the J 
positions? 

d. What is the smallest attainable probability of error if these two waveforms 
(differing in h of / positions) are used as the signals in communicating one of two 
equally likely messages over an additive Gaussian noise channel with S n (f) — 
A>„/2? 


e. What is the average, say P 2 [S], of the error probability of (d) over the 

ensemble of (c)? Show that 

P 2 [8] < 2-^*0, 

where N = AJ is the dimensionality of the (code base) ensemble. Derive an 
expression for the value of i? 0 . Discuss the relation between this value and that 
obtained by specializing the expression of Eq. 5.56. Hint. Note that the dimen- 
sionality of each particular signal in the code base is J, not N. 

f. Use the union bound to show that the average probability of error, P[£], 
for M = 2 nr m equally likely messages satisfies 

P[£] < 2 -iV i- R °~- R N3 

when the M signals are drawn independently at random from the ensemble of (c). 

g. Verify that the energy per bit is given by 



What is the minimum value of E b for which the bound of (f) is useful? Show 
that the bound of (f) can be rewritten in the form 

m < -" P {-*[^ ln i + U- 1 )^ - ln2 ]}’ 

where x = P s r/2J'f 0 , and determine the minimum value of E b in the limit x ->- 0. 

h. Compare the limiting value of E b in (g) with that obtained in the text for 
binary antipodal signals. How would the two values compare if the letters 
{<p k (t)} formed a simplex rather than an orthogonal set? 

i. Show that the minimum value of E b does not greatly exceed its limiting 
value when A » 1 and the letter duration r is chosen to satisfy 


iv 

•^o 


2 In A. 


j. For large A and r chosen as in (i) show that the number of dimensions per 
second, D, required by the signaling system is 


D = 


A P s 
2 In A JV’q* 
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If in addition we choose R (in bits per second) so that E b is twice the limiting 
value determined in (h), show that 

A 

D — R — \ 2 In 2. 

In A 

5,6 A set of A phase-shift waveforms (yhKOh in which L > 0 is an integer and 
a /- sin 2 tt - t + — ; — t < t < 0, k = 1, 2, . . . , A, 

y k (t) = L T - a J 

0; otherwise, 

is used to construct coded signals {s^t)} in the form 

■/ 

s( t) = V/> s t -jr). 

As in Problem 5.5, any s^t) is specified by a vector (k lt k 2 , ... ,kj ) whose 
components are integers between 1 and A. 

a. Consider a code-base ensemble in which each distinct waveform in the 
form of s(t) is assigned equal probability. Assuming that there is additive white 
Gaussian noise and that the signals {sit)} are chosen independently at random, 
show that the value of R 0 in the bound P[S] < 2- A ’t*o“- R Nl is 

*„ = A > 3 - 

What is the value of R 0 when A » 2? (Note that the dimensionality of the code 
base, N, is different for A =2 and A > 3.) . 

b. Discuss the relation between the values of R 0 obtained in (a) for A =2 and 
A « 4 and the value of R 0 for binary antipodal signaling. 

c. Show that in the limit as £ N /A’ 0 0 


2^2’ for every A ^ 2, 

in which E^ is the average energy transmitted per dimension. Discuss and 
interpret this rather surprising result. Hint. 


A - 1 

2 sin2 


*= i 


kn 

~A 



A >2. 


S.7a. Show that Hartley’s result, Eq. 1.1, can be written in the form R N - 
log 2 (i + a\ A). Compare and contrast the conditions and content of this result 
and the capacity theorem for additive white Gaussian noise, 

C N = I io§2 (1+2 E^IN o). 

b. Obtain rough numerical comparisons of the two statements by setting 
A = VyEu and choosing A so that the error probability per dimension in 
Hartley’s formulation is 10“ 2 ; IQ- 4 ; IQ- 6 . Discuss. 
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5.8 Consider communicating over a channel disturbed by additive Gaussian 
noise with S n (f) = J'P 0 /2 by means of signals constructed from D orthonormal 
waveforms per second. Constrain the average power of each signal to be no 
greater than P s . 

a. Show that the channel capacity in bits per second increases monotonically 
toward its maximum value, C m = (Pj-R’o) l°g2 as D increases, whereas the 
capacity in bits per dimension decreases monotonically with D. 

b. Similarly, show for binary antipodal codes that R 0 decreases, but 2>i? 0 
increases, monotonically with D. Show that the value of D required in order to 
achieve the bound 

P[S] < 2 -2 ’ EiCoo(1-ct)_B1 

when 0 < a « 1 is given approximately for antipodal codes by 


5.9a. Prove by induction that for any set of k events {A^ 

u ^1 > i puj - i 2 p [am 

_*=1 J 1 i =2 3=1 

Hint. Apply the bound 

pfc-i q fc-i 

P U B.i <2 PW 

- 2=1 J 2=1 

to the events {A t A k }, i < k. 

b. Now let one of M = 2 RT equally likely messages be transmitted over an 
additive white Gaussian channel by means of M orthogonal signals, each with 
energy E s . Use the theorem of (a) to prove that 

p[s] >(m- i) Q(VEjJf 0 ) - ~ mt, 

where the overhead bar denotes the mean of Q 2 (y ) when y is a unit-variance 
Gaussian random variable with mean V 2EJ JV, 0 . 

c. By using the bounds 

e-“ 2 / 2 / i \ 

h h a >°> 


G(«) < 


e_“ 2 / 2 ; a > 0, 


1 1 ; « < 0 , 

and the fact (see Appendix 7C for the general result) that 


1 I’m q — t» z /(1+2<t 2 > 

e -{y-viPl2o 2 e ~ v * dy < e -P = , 

■v 27 t(T J 0 Vl+2a 3 


prove that 


P[g] > jK-W*”-®]; R < , 

6 
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in which the coefficient B decays as l/Vy when T is large. [As stated in the 
text, stronger (but more laborious) arguments may be used to obtain a lower 
bound valid for all R < C^.] 

5.10 Use Stirling’s approximation, 

N\ » V2 

to prove that B N lB^_ t ^2 ir/N, N large, where B N (given in Eqs. 5D.11 and 

5D.14) is the volume of an A-dimensional sphere of unit radius. 



5.11 In Eq. 5.67c and in Appendix 5D we consider the probability density' 
function of the length of an A-component random vector n, each component of 
which is a statistically independent, zero mean, unit-variance Gaussian random 
variable. The probability density function of the squared length of n, say 

VN = M 2 = 2 

i = l 

is called the “chi-square density function with N degrees of freedom.” Let us 
denote this density function by pi y. 

a. Use the result of Appendix 7C to determine the characteristic function of 
VN- 

b. Express p N in terms of its characteristic function and by means of a single 
integration-by-parts show that 



d. Show that the transformation |n| = Vy lV reduces to the density 
function given in Eq. 5D,10a. 


6 


Implementation of Coded Systems 


The problem of finding appropriate classes of signals for the communication 
of data over bandlimited channels disturbed by additive white Gaussian 
noise was discussed in Chapter 5. We concluded that power-constrained 
communication systems, using signals of T sec duration, exist which 
simultaneously (1) require signals (codewords) whose dimensionality, N, 
increases only linearly with T; (2) accommodate a number of messages, 
M, that increases exponentially with T; (3) afford a probability of error 
that decreases exponentially with T. More specifically, we considered 
systems that communicate one of M equally likely messages over an 
additive white Gaussian noise channel by means of signals, 

N 

*i(0 = IW0; i = o, 1, . . . , M — 1, (6.1) 

5=1 

in which each coefficient s i} is chosen to be one of A amplitudes equally 
spaced over the interval [—si . E u , sj £ N ]. For signals of this form the 
probability of error achievable with optimum a posteriori probability 
computing receivers satisfies the simple union bound 

P[Sj < 2 -- V[22o “- k n ] , (6.2a) 

where 

M - 2 NRi * (6.2b) 

and R 0 (as a function of the energy-to-noise ratio per dimension, E N /Jf 0 ) 
is given by the curves of Figs. 5,17 and 5.18. 

If we know that such communication systems exist in principle, the 
remaining task is to determine how to build them. This is the subject of 
this chapter. In particular, given an appropriate set of orthonormal 
waveforms we are confronted with the problems of transmitter 

and receiver implementation. The latter— which is by far the more 
grievous— can be separated into problems concerning quantization of the 
received signal, decoding, and two-way systems. We shall consider the 
different problem areas in the order listed. 
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In the design of a communication system one is never interested in 
building an “optimum” system irrespective of cost. The appropriate 
engineering objective is to build the most economical system that meets a 
required standard of performance. Given a transmission channel, two 
factors that relate directly to questions of economy are (1) the data rate 
R, in bits per second, at which the channel is used, and (2) the complexity 
of the terminal equipment required to meet the performance standard 
at rate R. 

That these two factors are interrelated is made evident by rewriting 
Eq. 6.2 in terms of the time parameters T and R. If D is the number of 
orthogonal functions per second accommodated by the channel and T 
is the time duration of each signal in seconds, we have 

N = DT (6.3a) 

M = 2 HT (6.3b) 

and therefore 

P[g] < 2~ T[DR(, ~ m . (6.3c) 

It is clear from Eq. 6.3c that any required standard of performance, 

measured in terms of the allowable P[£], can be attained by choosing T, i 

DR 0 , and R appropriately. In its simplest expression the engineering 

design problem is to determine the three parameters in such a way that 

the over-all cost of the system is minimum. Each parameter affects the 

cost qualitatively in the following way when the other parameters are 

fixed. 

1. If we increase T, the cost increases: each signal (codeword) is 

specified by more vector components and there are many more signals r 

(M = 2 RT ) in the code. 

2. If we increase DR 0 , the cost increases: the maximum value of D is 
constrained by the transmission channel bandwidth and the maximum 
value of R 0 is constrained by the allowable value of E N jJ ^ Forcing D 
close to its maximum value is costly, as is increasing E N jN 0 . 

3. If we decrease R, the cost increases: three complete systems, each 
with rate R, are required to communicate the same amount of data per 
second as one system with rate 3 R. 

The appropriate choice of T, DR 0 , and R in any given communication j 

problem depends on the details of that problem. For instance, whether 
it is more economical to use three channels at rate R with simple terminal 
equipment or to use one channel at rate 3 R with complicated terminal 
equipment, depends on the relative costs of transmission facility and 
complex terminal equipment. Such questions cannot be considered 
quantitatively until “terminal equipment complexity” has been defined 
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in a meaningful way and the growth of complexity (hence cost) determined 
as a function of T and DR 0 . 

It is with these objectives in mind that we now address the problems of 
transmitter and receiver implementation. Initially, the number of degrees 
of freedom per second, D , is considered fixed. In Section 6.5 we discuss 
an example in which D is also a design parameter to be specified. 


6.1 TRANSMITTER IMPLEMENTATION 

The structure of the signals in Eq. 6.1 suggests a transmitter designed 
in two stages as shown in Fig. 6.1 (and previously, with different nomen- 
clature, in Fig. 4.12). The first stage, called the coder (or encoder) 


Coder 


Modulator 


Si(t) 


Figure 6.1 Two-stage transmitter: / = 0, 1, . . . , M — 1. 


observes the message to be communicated, m and generates a cor- 
responding sequence of N output digits, s*. The second stage, called the 
modulator or waveform generator, accepts the coder output 

s i = (%, s a , • • • , s. iN ) (6.4a) 

and generates the waveform 

A' 

5 «0) =2%?i(0* (6.4b) 

;=i 

First, let us investigate the complexity of the modulator as a function 
of T. We are interested in the case in which a new transmitter input 
message is accepted, and a new waveform generated, every T sec. If the 
(<p,(/)} are chosen to be nonoverlapping time translations of a single 
waveform with duration TjN, as shown in Fig. 6.2, the same signal 
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generator and amplitude modulator can be used over and over again, N 
times in succession. Alternatively, we can start with a small set of finite- 
duration orthonormal functions, such as the sinusoids rp^t), Va(0» and 
<p 3 (f) in Fig. 6.3, and choose the {(pit)} to be this set and their nonover- 
lapping time translates. In either case, since TjNi s constant, the complexity 
of the waveform generator part of the transmitter is relatively independent 





Figure 6.3 Orthonormal functions with combined time and frequency translation. 

This conclusion is not true of the coder part of the transmitter. We 
shall see next, however, that we can build an efficient coder, the complexity 
of which depends only linearly on T. 

The Encoding Problem 

The first problem in encoder design is that of input message storage. 
As in Chapter 5, we assume that the input message m during each T-sec 
interval is a sequence of K — NR n binary digits, say x. The sequence x 
may be any one of the set (xj of all 2 E vectors with components 0 or 1. 
We may visualize the data source as providing one new binary digit of x 
to the transmitter every TjK = l/R sec. In this case part of the encoder 
must be devoted to accepting and storing the vector x as it arrives, 
component by component. A convenient device for accomplishing this 
is a shift-register, which accepts binary digits at its input and shifts its 
contents one stage to the right each time a new digit arrives, as shown in 
Fig. 6.4. Since K is proportional to N, hence to T, the complexity of 
such a shift-register depends linearly on T. 
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In addition to accumulating the input message vector, the coder must 
implement an appropriate mapping x f => s. h i — 0, 1, . . . , M — 1. The 
problems involved are not trivial. Indeed, the construction of an appro- 
priate coder could easily be an engineering impossibility. To see this, we 
need consider only the magnitude of the numbers involved ; since 

M = 2 K = 2 rt = 2 NRh , (6.5) 

the required number of vectors in the set {s 4 } is enormous when T is large. 
For example, N = 200 and R H — imply M = 2 100 10 30 . 


X-bit shift register 



(b) 

Figure 6.4 Input data storage digits shifted out of the right-hand side of the X-bit 
shift register are discarded. 

For large K it is obviously impossible to implement the coder by 
choosing each of the M vectors (sj arbitrarily from the code base. (As 
in Chapter 5, the term “code base” refers to the set of all A N TV-component 
vectors whose components belong to the ,4-letter transmitter alphabet 
(aj.) To do so would require provisions for storing each selected vector 
in an ordered table containing M N entries, as shown in Fig. 6.5, and for 
reading out the ith table entry, s it whenever x* is the message input. 
The complexity of such a table-storage facility is proportional to the table 
size, MN, which grows with the time interval T as T2 RT . The size of the 
memory that would be required is simply too large. 

On the other hand, the error probability bound of Eq. 6.2 has thus far 
been established only by considering the average probability of error 
over the ensemble of all A NM possible codes. As we have seen (cf. p. 304), 
most of the codes in this ensemble must be good ones. But we have also 
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seen that some codes — for instance, those in which all s* are the same — 
are bad. It is not inconceivable that all of the easily implementable codes 
might be bad and that only those requiring table-storage implementation 
are good. The dilemma is obvious: it is not yet clear that any code that 
can be instrumented obeys our error probability bound. 

N components 



Figure 6.5 Table storage of an arbitrary code. 

Recapitulation of the Derivation of R 0 

A way exists out of this quandary: our error probability bound applies 
also to a smaller ensemble of communication systems, each of which uses 
a code that is easily instrumented. To prove this, we now investigate 
more carefully conditions under which the bound 

P[8] < (6.6) 

is valid. 

The starting point of the derivation of this bound (cf. Eq. 5.47) is the 
union inequality 

M - 1 

P[£ | m*] < 2 p 2 [s * s*], (6-7) 


in which P 2 [s*, s fc ] is the probability of error when specific vectors s* 
and s fc are used to communicate one of two equally likely messages. For 
an ensemble of communication systems chosen in such a way that the 
mean of P 2 [s*, s fc ] is bounded independent of the indices / and k by 

P 2 [s i ,Sfcl = P^[S] < 2 ~ nr °; for all / and k, (6.8) 

substitution of Eq. 6.8 in Eq. 6.7 yields 

P[S | m fc ] < (M - l)2~ yiia , (6.9) 
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which in turn implies Eq. 6.6. Thus Eq. 6.8 represents the crucial property 
that an ensemble must evidence for the derivation of Eq. 6.6 to be valid. 

For the ensemble of codes considered in Chapter 5, the validity of Eq. 
6.8 was ensured for all i and k by the nature of the probability assignment 
to the codes of the ensemble; the probability that any vector in the code 
base was assigned to the signal s k did not depend on k nor on which 
code-base vectors were assigned to the M — 1 other signals (sj, i ^ k. 
As a consequence the expectation 

I P 2 [a,i3]P[s, = «,s„ = i5] (6.10) 

all a, p in 
the code 
base 

was independent of i and k. Moreover, the statistical independence of s* 
and Sfc, 

P[v= «, - 0] = P[s < = «3 P[s ft = PI; . 

for all (/, k) and all (a, (3) in the code base, (6.1 la) 
together with the independence of the components of each s*, 

P[a, = a] = fl P [% = «#]; all ‘ and a > (6.11b) 

j-i 

made it possible to calculate the numerical value of R 0 . 

Now consider two distinct ensembles of communication systems such 
that the probability assigned to the event [s* = a, s fc = f3] in one ensemble 
is the same as the probability assigned to this event in the other. If this 
is true for all (/, k ) and all (a, (3) in the code base, it is clear that P 2 [8] is 
the same for both ensembles. Thus Eqs. 6.11a and b incorporate the only 
properties of an ensemble we need to establish the random coding bound 
and the value of R 0 . • 

Equation 6. 1 la requires only that any two signals s* and s ft be statistically 
independent; although heretofore we have considered an ensemble in 
which all M signals {s 2 } are statistically independent, it is sufficient that 
they be independent by pairs. The sufficiency of this much weaker con- 
dition enables us to validate our random coding bound for an ensemble 
of communication systems, each of which has an easily implemented 
coder. 

Parity-Check Codes 

We now consider an ensemble of codes which simultaneously meets 
two requirements: (a) over the ensemble, codewords are statistically 
independent by pairs and (b) each code in the ensemble can be implemented 
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by means of a device whose complexity (appropriately measured) grows 
linearly with K — RT. 

The parameter K is called the constraint span of the code. We first 
treat the binary case in which the codewords {s i } are vectors with com- 
ponents restricted to iV-EVi- 

The coding device is diagrammed in Fig. 6.6. Each of the first K blocks 
in the top rectangle (which we call the ^-register) represents a stage in a 
binary shift register. The encoder input sequence x is fed into this shift 


^-register 

X K • • • *2 *1 X‘0 



o— — >/® N 


Figure 6.6 Parity check coder. There is one modulo-2 adder associated with each 
stage y, of the ^-register, j =1,2 N. The switches close at time t = T. 

register one bit at a time, so that at the end of T sec the K binary digits 
of x are stored in the K shift-register stages in the positions indicated in 
the figure. The (. K + l)th square, labeled x 0 , represents a storage element 
that always contains 1 . 

The N symbols © in Fig. 6.6 represent modulo-2 adders, and the lines' 
from squares to adders represent connections. The output at time T 
from the yth adder, say is the modulo-2 sum of the digits {x ;i } stored in 
the stages of the re-register to which the yth adder is connected. Since 
modulo-2 addition is defined by the equations 

0©0 = 1 © 1 = 0 

0 © 1 = 1 ©0 = 1 , 

we see that y } is 1 if the number of I’s stored in these stages is odd and 
zero otherwise. The {y 3 } are called parity checks and the device is called 
a parity-check coder. 



i 
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At time T the {y,} are fed in parallel to the lower rectangle, called the 
^/-register, each square of which again represents a stage of shift-register. 
During the interval [T, 2 T], these N binary digits are shifted out, one at 
a time. Thus the ^-register output is some sequence, say y, of 0’s and l’s: 

y = ivi, y* ■ ■ ■ , y*Y, Vj = o, 1, ally. (6.12) 

We can convert this sequence into a signal vector, s, of the desired binary 
form by the simple expedient of transforming 1 into +\/ ancl 0 into 
— \/ E n in the transducer: 

Vi = l => s } = +\/£ n 

y s = 0 => s i = —V £ n . (6.1 3) 

Since N = K/R N , the complexity of a parity-check coder, measured in 
terms of the total number of shift-register stages, is proportional to K. 


X3 *2 XI XO 




*3 X2 *1 *0 





From the description just given, it is obvious that the device in Fig. 6.6 
is indeed a coder: given the connections from the adders to the x-register, 
there is a particular iV-component output vector s, : associated with each 
X-component input vector x*. It is also apparent that different codes, or 
mappings (x* => s f ), i = 0, 1, 2, . . . , 2 K — 1, result when the connections 
are made in different ways. 

As an example, consider the two coders diagrammed in Fig. 6.7. For 
both, K = 3 and N = 5. By convention, we shall always let x* denote 
the input vector corresponding to the number i written in binary form, 
with the first component taken as the most significant digit. By inspection, 
the two mapping x => y are as given in Table 6.1. The two codes (sj are 
obtained by substituting +V E H for l and —\!e h for 0 in the (yj. Since 
the vectors (yj for the second device occur in pairs, it is obvious that i 
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this is a poor way to connect the coder; but we shall soon see that most 
connections yield good codes. 

Insight into the structure of the (yj generated by a parity-check coder 
is gained by considering the set of connection coefficients {f M }. Define 

Table 6.1 Codes Obtained from Two Parity-Check Coders 

First Coder Second Coder 


ZC-\ ZCn 0 Ci> 


Vi 2/2 2/3 2/4 2/5 


2/1 2/2 2/3 2/4 2/5 


x 0 : 0 0 0 

x x : 0 0 1 

x 2 : 0 1 0 

x 3 : 0 1 1 

x 4 : 1 0 0 

x 5 : 1 0 1 

x„: 1 1 0 

x 7 : 1 1 1 


1110 1 
110 0 0 
0 0 111 
0 0 0 1 0 
10 110 
10 0 11 
0 110 0 
0 10 0 1 


10 110 
0 10 11 
0 0 0 0 1 
0 10 11 
0 0 0 0 1 
1110 0 
10 110 


f h . as 1 if the hth stage of the x-register in Fig. 6.6 and the/th modulo-2 
adder are connected, and 0 otherwise : 


n, affects yj, 0 < A < JC, (6.14) 

|0, otherwise , 

For example, in the first coder of Fig. 6.7 each of the coefficients 

/oi,/o 2 ,/o 3 ,/o 6 

/ 12 ./ 14./15 

f-ZtofvtofvA 

fzfcftUi 

is 1, and all other f M are 0. 

The set {/ w } completely specifies the coder connections, hence the 
mapping x* =► y i? all i. In particular, we observe from Fig. 6.6 that 

2/i = foi © x ifu © x 2fzi © ‘ • ' © x k/ki 

Vi — J 02 © X lfl2 © X lfvi © ■ ' ‘ © X kJk2 


Vn = foN © flN © X s/ 2N © * ' ' © X K IkN> ( 6 - 1 5a ) 

where the {x h } are the components of the input vector 

X = ^2» • • • 3 X k)' 


(6.15b) 
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Equations 6.15 can be simplified by the use of vector notation. The 
modulo-2 sum of two binary vectors, say a and b, is defined as 

a © b = («i © b lt #2 © h, • • • * a .\ © &.v)* (6.16) 

For example, 

(0, 1, 1,0) ©(1,0, 1,0) = Cl, 1,0,0) 

and, for any binary vector c, 

c © c = (0, 0, . . . , 0) = 0. (6.17) 

With this definition, Eq. 6.15a can be written in the more concise form 
y = fo © x ifi © *84 © • • ■ © x k^k> (6- 1 ^ a ) 

in which 

y = ( 2 /i, 2 / 2 , • • • , y A') (6.18b) 

f* = UniJn* • • • J*nY> 0<h<K. (6.18c) 

Thus the connection vectors (fj for the first coder in Fig. 6.7 are 

f 0 = O, U,0, 1) 
f, - (0, 1, 0, 1, 1) 
f 2 = (1,1, 0,1,0) 
f 3 = (0, 0, 1, 0, 1). 

When x is the binary vector each of whose components except x h is 0, the 
corresponding output vector is y = f 0 © 4- More generally, y is the 
modulo-2 sum of f 0 and those f h corresponding to nonzero components 
of x. 

The ensemble of binary codes. We now discuss the set of all binary 
codes that can be generated by a parity-check coder and show that the 
average probability of error over this set obeys the random coding bound 
of Eq. 6.6 without degradation of R 0 . For any K and N a particular code 

is specified by the set of (it + 1) connection vectors {f 7{ }, A«0, 1,2 

JC. Each of these vectors has N components and each component can be 
0 or 1. There are N(K + 1) components f M to be assigned, hence 2 N{K+1) 
ways to connect the coder. 

Suppose that each of the 2 W(K+1) coders appears in an ensemble with 
equal probability, 2“^ (i£:+1) . This implies that each of the connection 
coefficients {f hj } is equally likely to be 0 or 1 and that each coefficient is 
statistically independent of all others. An equivalent statement is that 
each of the connection vectors f h is equally likely to be any one of the 2 N 
binary vectors of length N and that the {f 7( } are statistically independent. 
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We say that a random binary vector (such as any f A ) is EL if its com- 
ponents are statistically independent and equally likely to be 0 or 1. In 
proving the random coding bound for parity-check coders, the following 
property of the modulo-2 sum of two random jV-component binary 
vectors, say a and b, is of central importance: 

If a is EL and statistically independent ofb, then c = a © b is also EL 
and independent ofb. 

In equation form this statement is 

P[c = a] = P[c = a | b = (3] = 2~ N ; for all a, (3, (6.19) 

where a and (3 are 77-component binary vectors. 

The proof of Eq. 6.19 is straightforward. From Eq. 6.17, if c = a © b, 
then 

c©b = a©b©b = a. (6.20a) 

Thus c = a when b = (3 if and only if 

a = a © (3. (6.20b) 

But a is EL and independent of b. Therefore for any a and (3 

P[c = a | b = (3] == P[a == a ©(3 | b == (3] = P[a - a ©(3] = 2“*, 

(6.20c) 

and 

P[c = «] = T P[c = a I b = p] P[b = (3J = 2~' v . (6.20d) 

all p 

As claimed, c is EL and statistically independent of b. 

We now invoke this property to show that if x t is the input to a parity- 
check coder then over the ensemble'of encoder connections (1) the coder 
output vector y* is EL and (2) the coder output vector y t is pairwise 
statistically independent of the vector y fc produced by any other input x k , 
k 5^ i. These two results will be used to establish that 

PlkTij < 2~ N * (6.21) 

where R 0 is the error exponent for binary-waveform sequences given by 
Eq. 5.36. 

Proof that any y* is EL follows from the fact (Eq. 6. 1 Ba) that 

y* = fo © x iA © x i2 f 2 © • • • © x iA- 

Letting a denote the modulo-2 sum of the input-dependent terms, we have 

y . = f 0 © a. (6.22a) 
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But, over the ensemble, f 0 is both EL and statistically independent of all 
other f /s , hence of a. Accordingly, for any i, y* is EL: 

P[y. = a] = 2“ A ; for all a. (6.22b) 

(Note, however, that if f 0 were not included in the ensemble of connection 
vectors the output y 0 produced by the all-zero input sequence x 0 would 
also be identically zero, hence not EL.) 

Proof that the pair of output vectors y* and y k are statistically inde- 
pendent when i ^ k follows from the observation that x { and x k differ in 
at least one component. Let / denote such a component and assume 
initially that 

x n ~ 1? x ki = 0* (6.23a) 

We can therefore write 

y* = f . © b, (6.23b) 

where f z enters neither into b nor y k , hence is independent of both. Since 
f z is EL and statistically independent of the pair (b, y k ), so also is y,. 
Indeed, for all 77-component binary vectors a, (3, y we have 

p[y. = a j b = (3, y fc = y] = P[f, = a ©{3 | b = [3, y fc = y] 

= P[fj — a © p] = 2~ N . (6.23c) 

Thus 

p [y i = a | y* = Yl = 2 P[y< = « I b = |3, y k = y] P[b — p | y fc = y] 

all p 

= 2- v 2P[b = p|y, = Y ] 

aiip 

= 2- v = P[y f = «]. (6.23d) 

For this proof, we have assumed that x u — 1 and x kl = 0. If on the 
contrary x u = 0 and x kl = 1, the statistical independence of y,- and y k 
follows from interchanging the indices i and k in the preceding argument. 

With these two results, we now establish that the probability of error 
bound of Eq. 6.6 applies to the ensemble of binary codes (sj defined by 
the set of all equally likely parity-check encoders. Since each output 
vector y f implies a definite, signal vector s t , the pairwise statistical inde- 
pendence of the y t - implies pairwise statistical independence of the s { . 
Furthermore, since each y i is equally likely to be any binary vector with 
components 0 and 1, each s z is equally likely to be any binary vector with 
components ±V E N . Thus Eqs. 6.11a and b are both satisfied. We 
conclude that the probability of error for communication over an additive 
white Gaussian' noise channel, when averaged over the ensemble of all 
parity-check-coded systems, satisfies 

P[6] < 2 _a ' [ - k °-- k n3 


(6.24a) 
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with (from Eq. 5.36) 

R 0 =l - log* (1 + e~ KN/ °)- (6.24b) 

We have already noted (Eq. 5.42) that this value of R 0 is exponentially 
optimum for small values of the energy-to-noise ratio per degree of 
freedom,! £ N /JT 0 - Under these conditions the use of a transmitter whose 
complexity grows only linearly with K, hence with the signal duration T, 
does not imply a loss in signaling efficiency. 

Multiamplitude codes. Parity-check coders also provide an effective 
escape from the difficulty of storing an exponentially large set of multi- 
amplitude-component signal vectors (sj for use on channels with high 


x K • • • X 2 Xi XO 



Figure 6.8 A multiamplitude parity-check coder. The transducer produces one com- 
ponent of s from each successive block of log 2 A components of y. 

energy-to-noise ratio per degree of freedom. An appropriate coder for 
such a condition is shown in Fig. 6.8. Whenever A, the number of signal 
amplitudes (alphabet letters) on which we wish to assign the is a 
power of 2, we use 

N' = N log 2 A 

stages in the y-register, instead of only N. As usual, N is the number of 

dimensions occupied by the signal set {sj. . 

The amplitudes of the coefficients of the signal vectors are obtained from 
the output of the y-register by feeding y, log 2 A digits at a time, into the 

t The “degrees of freedom” of a signal set {*(/)> is defined as the number, N, of ortho- 
normal functions {?*,(/)> used in its construction, i.e., as the signal set dimensionality. 
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transducer shown in Fig. 6.8. Clearly, these log 2 A digits may be used to 
specify one of A different amplitudes. In a typical case, say A = 8, the 


I J 

transducer might 
transformations 

be a digital-to-analog 

convertor 

specified 1 

Input 

Output 

Input 

Output 

0 1 1 

+V-£ n 

1 0 0 

-WEn 

0 1 0 

T In/En 

1 0 I 

-bfa 

0 0 1 

+ 7 V Rfi 

1 1 0 

— hJ £ n 

0 0 0 

+W 

1 1 1 

-V^N 


We now show that the multiamplitude random coding bound of Eq. 
5.55 applies for an ensemble of parity-check coders and a fixed transducer. 
The 2 iV ' (K+1) distinct coder connections are assumed to be equally likely. 
Since s £ and s fc depend only on y, and y fc) respectively, and (as in the binary 
transducer case) y t and y fc are pairwise statistically independent for any i 
and k ^ i, it is clear that s, and s k are pairwise statistically independent. 
Furthermore, for all i each component y ip j= 1,2,..., iV 7 , is sta- 
tistically independent of all other components in y £ and equally likely to 
be 0 or 1. Thus each component of any vector s, at the transducer output 
is statistically independent of all of the other components in s t - and when 
A is a power of two is equally likely to be any one of the A possible values. 
The conditions of Eqs. 6.11a and b for the validity of the random coding 
bound are therefore met, and we again have 

P[g] < 2-Maor-JM (6.25a) 

in which R 0 is given by Eq. 5.56, with A a power of 2 and 




l = 1, 2, . . . , A. 


(6.25b) 


The simple coding strategy just considered is sufficient to attain near 
exponential optimality whenever the energy-to-noise ratio per degree of 
freedom is such that Shannon’s upper bound, R 0 *, is closely approximated 
by the R 0 of Eq. 5.56, as plotted in Fig. 5.17. If this cannot be accomplished 
satisfactorily with A’s that are powers of 2 and equally likely p? s, matters 
can be improved by elaborating the procedure at the cost of making 
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N'jN > log 2 A. For example, if A = 3 is satisfactory, we can use N jN 
and the transducer mapping 

o i n 

o i o > => + V e n 
0 0 lj 



1 0 1 
1 1 0 
1 1 1 

The signaling amplitudes +V £ N , 0, — are now used with the unequal 
probabilities §, f, f; however, as we found in Chapter 5, it is desirable 
to use the amplitude zero with a probability less than 1/3 when A — 3. 
Clearly, any A and any desired probability set {pi} can be approximated 
by making N'/N sufficiently large and using an appropriate transducer 
mapping. 

Invariance of P[S|m fc ]. We noted in Section 4.5 that certain com- 
pletely symmetric signal sets {sj, such as the set of M orthogonal signals 
and the set of M simplex signals, exhibit the following important property . 
with equally likely messages and white Gaussian noise, the optimum - 
that is, the maximum likelihood — receiver yields a probability of error 
which is independent of the signal actually transmitted. 

P[S|m fc ] = P[S|/«oi; for /c = 0, 1 M — 1. (6.26) 

We show in this section that every binary parity-check code also exhibits 
this property. 

The first step in proving this invariance is to observe -the effect of adding 
an arbitrary /V-component binary vector, say 

a = (a lt a 2 , . . . , a v )» 

modulo-2 to each of the encoder output vectors (yj: the ;th component 
of every is complemented when a i = 1 and is left unaltered when a, — 0. 
By “complement” we mean the transformation 




1 ->0. 


PARITY-CHECK CODES 379 

Since with a binary code the transmitted vectors {s*} are obtained from 
the (yj by 

Vii = = -y/E N 

Va = 1 => % = -f -sj £n 
and 

N 

3=1 

the effect on the {5/0} of adding a to each y £ is to transform 

for all j such that a,- = 1. Such a transformation does not affect the 
orthonormality of the {<p 3 -(/)}. Since the minimum probability of error 
with additive white Gaussian noise is invariant to the particular choice 
of {(pft)} (cf. Chapter 4), for any binary parity-check code we have 

P[8 | m k , {yj] = P[£ | m k , {y* © a}] ; for all k and any a. (6.27) 

An immediate implication of Eq. 6.27 is that the minimum error prob- 
ability for any binary parity-check coder such as that diagrammed in Fig. 
6.6 is independent of the choice of connection vector f 0 . In particular, 
setting a = f 0 in Eq. 6.27 is equivalent to having chosen f 0 = 0 initially; 
in the binary case, although including f 0 in the ensemble of codes simplifies 
the proof that the ensemble obeys the error probability bound, its inclusion 
has no effect on the actual error behavior of any code in the ensemble. 
Note, however, that this statement is not true in general when the number 
of amplitudes in the transmitter alphabet A is greater than two; in the 
multiamplitude case f 0 enters into determination of the magnitude of the 
signal coefficients (j w ) rather than only into the determination of their sign. 

We are now in a. position to prove that any particular binary parity- 
check code obeys Eq. 6.26, so that the error probability of the maximum 
likelihood receiver is independent of which message is transmitted. 

Without disturbing the error probabilities, we may take f 0 = 0. ^ The 
coder output vector y is then related to the coder input vector x = 
*» • • • , x k) by 

y = X A © X 2 f 2 © • • • © X K i K-> ( 6 - 28 ) 

in which the {f ft } are the connection vectors of the particular code under 
consideration. The key to proving Eq. 6.26 is to note that Eq. 6.28 
implies the following property: 

If a is any member of {yj, the two sets (y t © a}, i = 0, 1, . . . , M — 1, 
and (y f ) both comprise the same vectors . 
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As an example, if a = y 2 and the {yj are 

y 0 = 0 0 0 0 0 

yi = 0 1 0 1 1 

y 2 =l 1 0 1 0 

y 3 — 1 0 0 0 1, 

then 

y 0 © a = 1 1 0 1 0 — y- 2 

yi © a = 1 0 0 0 1 = y 3 

y 2 @ a = 0 0 0 0 0 = y 0 

y 3 © a = 0 1 0 1 1 - yi- 

The {y^ © a} differ from the {yj by a relabeling of subscripts. 

The general proof of this closure property depends on the fact that Eq. 
6.28 is linear in the sense that if the coder input is x* © x k the output is 
y-© y fc . For fixed k the coder input set {x* © x fc }, i = 0, 1, . . . , M — 1, 
contains each of the 2 K binary vectors of length K once and only once. 
Thus {x* © x fc } is a relabeling of {x*}, which implies that when a = y k , 
{y # . © a} is a relabeling of {yj. Codes for which this is so are called 
“group codes.” 66,76 

Proof of Eq. 6.26 follows from the closure property. From Eq. 6.27 
we know that 

P[8 | m k , {yi © a}] = P[8 | m k , {yjl- (6.29a) 

If we choose a = y fc , then 

y k © a = 0 = y 0 . (6.29b) 

Thus the transmitted vector with the code (y 4 - © a} when m — m k is- the 
same as the transmitted vector with the code {yj when m — m Q . Since 
the remaining signal vectors are also the same, we have 

P[8 | m k , {yi © a}] = P[£ | m 0 , {yj]. (6.29c) 

Equating the right-hand sides of Eqs. 6.29 a and c yields Eq. 6.26, which 
was to be proved. 

An immediate corollary of Eq. 6.26 is that the probability of error 
resulting when any binary parity-check code is used over an additive 
white Gaussian noise channel is invariant to the actual a priori prob- 
abilities {PM} whenever the receiver is maximum likelihood, hence 
optimum for equally likely messages. This corollary provides additional 
cogent justification for the equally likely a priori probability assumption; 
in accordance with the discussion of minimax receivers in Section 4.5, any 
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receiver that is optimum for equally likely message inputs and for which 
P[8 | m k ] is independent of k is also minimax. 

Unfortunately, with multiamplitude codes the invariance of P[8 | »? fc ] 
to k is lost in the asymmetric transformation {yj — > {sj. In principle 
significant sensitivity of the error probability with respect to m k can be 
remedied in multiamplitude codes by means of an appropriate expurgation 
procedure. For example, if we denote by P the error probability that 
would result if a given IV-dimension, A'-bit — abbreviated (N, K) — code 
were used with equal a priori probabilities, then 

P = 2 PK-] P{£ | m fc ] 

k 

= 7: 2 p [ g I "»*]• (6.30a) 

Mk = 0 

Clearly, no more than half of the P[£ J m k ] can be greater than twice P. 
If we delete those members of {sj for which 

P[8 | m k ] > 2 P, (6.30b) 

we have left a new code consisting of at least 2 K ~ 1 signals for each of 
which 

P[8 | m,] < 2 P. (6.30c) 

Moreover, the rate of this expurgated code in bits per dimension, 

R n = , (6.30d) 

is very nearly equal to the original unexpurgated rate, KjN, when K is 
large. The difficulty with the expurgation procedure is that one needs to 
know the {P[£ | m z ]} in order to apply it: as already pointed out many 
times, in general we cannot hope to calculate all 2 K of these conditional 
probabilities when K is large. 

Orthogonal and simplex codes. Parity-check coders may also be used 
to generate orthogonal and simplex signals, with N = 2 K and 2 K — 1, 
respectively. It is particularly interesting that with this technique each 
resulting signal vector, say s*, is binary; that is, 

= Oa> s s2 , • • • , 5,..v); / = 0, 1, . . . , 2 k - 1, 

with 

+V £ n or -V-£ n ; for all/,;. 

To see how to generate such signal sets, consider the case K — 2 and 
N = 2 A = 4. We take f 0 = 0 and choose the parity-check coder- 
connection vectors f x and f 2 to be 

f ! = 1 0 1 0 
f 2 = 1 1 0 0. 


(6.31a) 
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JO/. mi .LWi'i ~ — — 

Then, in accordance with Eq. 6.28, 

the {y,-} 

are 



y 0 = 0 

0 0 

0 



yi = 1 

0 1 

0 



y 2 = 1 

1 0 

0 



y 3 = 0 

1 1 

0. 


(6.31b) 

The corresponding binary vectors { 

Sj} are 




s 0 = V-En 

-1, 

-1, 

-1) 


Sl == VEn(+ j » 

-1, 

+ 1» 

-0 


s 2 — \Je^(_+1, 

+ 1, 

-1, 

-1) 


S 3 = V E N ( I, 

+ I 5 

+ 1, 

-1) 



It is apparent that the dot product of any two vectors s.< and s k is 

s i • Sfc — NE n 8 ik . (6.32) 


Thus these vectors (sj form an orthogonal set, and each has length 

Jne^. , , , 

The reason for the orthogonality of the {sj becomes clear when we 
consider the structure of the {f A }. Each f h consists of alternate groups of 
1 ’s and 0’s. In fj the groups are of length 2° ; in f 8 the groups are of length 
2i. It is because of this that each vector y fc differs from every other vector 
y. in exactly Nj2 coordinates, which fact in turn implies orthogonality 
between the (sj. 

We now prove for every K and N = 2* that, if the coder-connection 
vectors are alternate groups of l’s and 0’s, with f h having groups of length 
2*- 1 , the resulting coder generates a set of 2 K orthogonal vectors. Let 
ft \ \ — i 2 ... ,k, denote the connection vectors for the case (N == 2 ', 
kL k) and let h = 1, 2, . . . , k + 1, denote the connection vectors 
for the case (N = 2 k +\ K=k+ 1). The alternate grouping implies 

8i = ( f i> f i) 

g 2 = (f 2 , f 8 ) 




(6.33a) 
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in which we use the notation 

(Zj, ^ 2 ) = ( Z ll> Z 12’ • • • > 2 l.V» Z 21’ ^22 s - • • : 

The ( k -|- l)th connection vector is 

g fc+ i = (l,l,..., 1, 0,0,.. .,0 ). 

2 k Vs 2 k 0’s 


*>• 


(6.33b) 

(6.33c) 


Proof of orthogonality is by induction. Assume that the set of vectors 
(sj is orthogonal for K = k. From Eqs. 6.28 and 6.33 the set of signals 
for K — k 4- 1, say {s/}, can be written in terms of the signals (sj for 
K — k as 


Sg* = (Sj, S,) 

S 2z+1 = ( — S !> S j); 


for i = 0, 1, 2, ... , 2 k - 1. 


(6.34) 


Equations 6.34 follow from the fact that since # fc+1 by convention is the 
least significant digit, g fc+1 enters into the determination of but not 
into the determination of s' 2i , for all / < 2 fc — 1. The effect of including 
g fc+1 is to change the sign of the first 2 k components of the signal vector 
that would result if g fc+1 were not included. 

From Eq. 6.33b, 

(Zi, z 2 ) • (z 3 , Z 4 ) = (Zi • z 3 ) + (z 2 • z „). 

By virtue of the orthogonality (assumed for the induction) between the 
2 fe -component vectors {s,}, we therefore have 

<S|, Sj) • (s 2 , s^ - 2 k E u d„ + 2 k E H d u - 2 » iU (6.35a) 
(-Sj, Sj) • (-Sj, Sj) = 2 k E N 5jj + 2 k E H 6 it = 2 k ^E N 6 n> (6.35b) 
(Sj, Sj) • (-s,, Sj) = -2 k E H 6 it -1- 2 k E N 8 n = 0. (6.35c) 


Thus the orthogonality of the signal vectors in the case K = k guarantees 
the orthogonality in the case K = k + 1. Since we have seen that the 
theorem is true for K — 2, the proof is complete. That the theorem is 
also true for K = 1 is obvious by inspection. (It is convenient to begin 
the induction argument with K = 2 because of the insight afforded into 
the structure of the {fjj.) 

The advantage of generating orthogonal waveforms in this way is 
obvious; from an engineering point of view their generation is relatively 
simple. Of course, this is also true of short pulses positioned in time so 
that they do not overlap. With parity-check waveforms, however, the 
problem of a high peak-power requirement is avoided, as illustrated in 
Fig. 6.9. 

To obtain a set of 2 K simplex waveforms it is only necessary to modify 
the coder just described by deleting the Mh stage of the ^-register, leaving 
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the first 2 K — 1 stages unchanged. This corresponds to deleting the last 
component of each of the {f*}. Since our choice of the %} was such that 
/ = 0 for all h, it follows that = 0 for all y f ; thus truncating the code 

words to length 2* - 1 does not affect the error performance of the {s J. 
(Recall that in Chapter 4 simplex waveforms were obtained from orthog- 
onal waveforms by a translation that did not affect P[S].) 



8^ n 



6VE n 



4-^ 

- 


2 V^n 

- 



i i i i i | ii l i l i 1 l 1 Ml 1 1 II 

I m 1 1 l 1 11 1 11 II 1 li 1 IJ-LI.I 1 1 1 1 1 1 1 1 1 1 


> 1 1 1 M 1 1 111! _ 11111 1 l 1 l-i 

0 X 

4 

T 3T T= 64 

2 4 


(b) 

Figure 6.9 Peak amplitudes required with typical binary and pulse-position orthogonal 
signals, N = 64. 


Proof that the set of signal vectors resulting from this truncation does 
form a simplex is trivial. Letting (sj denote the original orthogonal 
signal vectors of length 2 K and {$/} the truncated set, we observe that the 
(2 K )th component always contributes the term (+£ N ) to s ; ' s i • Thus 

s / • s/ = Sj ♦ Sj — E n 

((2 k — 1)£ n ; for i = l 

(6.36) 

— £ n ; for i^l. 
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Since Eq, 6.36 reduces to the simplex definition of Eq. 4.99 when 2 K is 
identified with M and ME H with E s , the proof is complete. 

The encoder for simplex vectors in the case K = 3, N = 2 K — 1 = 7, is 
shown in Fig. 6.10. The connection vectors are 

f x =1010101 
f 2 = 1 1 0 0 1 1 0 

f 3 = 1 1 1 1 0 0 0. (6.37) 

It is interesting to note that each column on the right-hand side of 
Eqs. 6.37 represents a different one of the (2 K — 1) distinct non-null 
parity-check connections. It can be shown that this is true for all K: 
simplex codes with 2 K words can be generated by performing all distinct 


*3 *2 *1 



Figure 6.10 Parity-check encoder for simplex {K ~ 1, N — 2 K — \ =7): 
y = © x 3 © x a , x a © X 3 , © X 3 , X 3 , X L © X 2 , X 2 , X j). 

non-null parity checks on a sequence of K message bits. An especially 
simple implementation of a simplex coder is a Ai-bit shift register connected 
in a maximal-length feedback configuration. 36 ’ 86 An example is shown in 
Fig. 6.11. 

Discussion. Much study has been devoted to parity-check coders, 
particularly for the binary signal case. Catalogs of optimum, that is, 
minimum P[S], binary codes have been compiled 66 for many cases in 
which either K or N — K or both are small. The known techniques for 
finding optimum codes are essentially those of exhaustive enumeration 
and evaluation and usually cannot be applied when both K and (N — K) 
are large. No general algorithm is known for constructing explicit codes 
for which it can be proved that the probability of error is overbounded 
by Eq. 6.6. 

At first glance it is startling that the error probability averaged over all 
(N, K) binary codes is, in general, smaller than the behavior of the best code 
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for which the error probability can be calculated. In some sense it appears 
that it is the absence of simple structure that makes a code good. Unfor- 
tunately, however, it is not possible to calculate the error probability of 
specific large codes that are not highly structured. 



Operate switches after x is loaded into x-register and step seven times 
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Figure 6.11 Maximal length shift register encoder for simplex (N — 7, K = 3): 
y = (a-j, x 2 , x 3 , x x © x 2 , x 2 © x 3 , x, © x t ® x 3 , x* © x 3 ). 

6.2 RECEIVER QUANTIZATION 

We have been studying the problem of building a transmitter that is 
capable of efficiently communicating one of 

M = 2 NR " = 2 rt (6.38a) 

messages even when NR n is large. For the additive white Gaussian noise 
channel we have observed that it is not difficult to construct an ensemble 
in which the transmitters are easily implemented and for which the bound 

PpTf < 2“ A ' ti?0 ~' K Ni (6.38 b) 

is satisfied with an R 0 that is nearly optimum. 

The problem of implementing an efficient receiver is not so easily 
resolved. The bound of Eq. 6.38b was derived under the assumption that 
each member of the ensemble of communication systems has an optimum 
receiver. Optimum receivers for signals 

S { ( o = 9b(0; i = 0, 1, . . . , M ~ 1 (6.39) 

3=1 

have been studied in Chapter 4. As illustrated in Fig. 6.12, one imple- 
mentation is a bank of N filters matched to the followed by circuits 
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that compute the M dot products 1 

r-Si (6.40a) Ij 

3=1 j| 

and determine for which i the decision variable 

&=r. s { . -ijsd 2 ; i = 0, 1, • “,M~ 1 (6.40b) 

is maximum. (In Eq. 6.40 and throughout this chapter we assume equal | 

a priori message probabilities.) 1 



t = T 

Figure 6.12 An optimum receiver realization: r = (r u r 3 , . . . , r^). 

Clearly, the complexity of implementing the bank of matched filters 
illustrated in Fig. 6.12 grows no faster than linearly with N. Indeed, as 
stated in Section 6.1 in connection with modulator design, the complexity 
is independent of N if the {<p ; (r)} are chosen to be nonoverlapping time- 
translates of a single waveform of duration r. As shown in Fig. 6.13, in 
this case we can use a single matched filter and sample its output at times 
jr,J= 1,2, ..., N. 

On the other hand, there remains the problem of calculating the set of 
decision variables {g £ }. At first glance it might appear that a high-speed 
digital computer could resolve this difficulty. But for large T this is not so ; 
from Eqs. 6.40 the number of calculations involved in determining the 
(g 2 -) is NM. For R = 1000 bits/sec and T = A sec, we have 

M = 2 RT = 2 109 > 10 3 °, 

which in a serial computer would allow 10 -22 nanosecond for computing 
each sum. One cannot trifle with exponential growth. 
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An alternative is to calculate the {&} in time-parallel rather than in 
time-sequence, perhaps by resistor weighting networks and summing 
busses. But this would require approximately NM resistors, and ex- 
ponential growth in number of components is no more attractive than 
exponential growth in speed of computation. In general, the only recourse 
is to accept a receiver that is less than optimum. 

Once we are reconciled to some loss in performance, the problem is to 
determine receiving procedures with acceptable degradation. In this 
framework special-purpose digital computers, called decoders, assume a 



h(t) = <fii(r-t) 
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Figure 6.1 3 Optimum receiver realization for a time-translated orthonormal set {^(O) : 
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role of central importance, primarily because of the great flexibility with 
which they process data. 

When a decoder is used, the performance degradation arises from two 
sources. First, the vector r at the output of the matched-filter bank has 
components {rj that are defined on a continuum, whereas a digital com- 
puter operates only with discrete numbers. Thus some form of amplitude 
quantization is usually introduced ahead of the computer. Second, the 
number of computations demanded of the computer must be restricted to 
growing no faster than linearly with the signal duration T. The first 
source of degradation is considered in the remainder of this section; the 
second is considered in Section 6.4. 

Measure of Degradation 

It is evident that transforming the A-component vector r into a discrete 
vector, suitable for computer processing is an irreversible operation and 
in general degrades the attainable error performance. It is intuitively 
reasonable that this degradation will be small if the quantization is 
extremely fine. On the other hand, coarse quantization is desirable because 
it decreases the memory requirements, hence the cost of the decoder: if 
each component of r is quantized into one of Q levels (Q a power of 2), 
N Iog 2 Q bits of memory are required to store the quantized vector in the 
computer. 

The appropriate engineering balance between system cost and per- 
formance cannot be adjudged without some quantitative measure of the 
effect of quantization on the probability of error. An especially useful 
measure of degradation in a coding situation is provided by the exponent 
in the random coding bound. 

Heretofore the exponential parameter R 0 in the bound P[S] < 
has been determined only for an ensemble of communication systems 
utilizing parity-check coders, transducers, and optimum (unquantized) 
receivers. We now consider the parameter R 0 ' in a corresponding bound 
P[S] < 2 _ - iV(i{ °'- R N> f or an ensemble of systems with the same transmitters 
but with receivers having the structure of Fig. 6.14, in which a quantizer Q 
is inserted between matched filter and decoder. The decoder itself is 
assumed to be optimum in the sense that it determines, from the quantized 
vector r' and knowledge of the signal set (sj, which message has maximum 
a posteriori probability P [m t | r' == y, {s,}]. The difference between R 0 
and R 0 ' provides a meaningful measure of the degradation due to quan- 
tization.f 

f Methods other than the direct quantization of each component of r may also be used 
to produce a discrete decoder input vector. (See problem 6.10.) 
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Sample at t = T 

Figure 6.14 Quantized receiver. The decoder input vector is r' = (r/, r 2 ', . . . , r/). 

The Quantized-Channel Model 

With an additive white Gaussian noise channel, each component s, of 
the transmitted vector s is corrupted by the addition of a statistically 
independent Gaussian noise variable. Thus, if {aj denotes the trans- 
mitter alphabet, when s t = a t the jth component of the (unquantized) 
received vector r is described by the density function 

Pr.iy I Sj = «.) = -j= (6.41) 

As illustrated in Fig. 6.15, the quantizer maps r t into an output com- 
ponent r/ that cannot assume an arbitrary value on the real line but is 



Figure 6.15 Input-output relations of a quantizer. The interval of r corresponding to 
r' = bh is A a . 


Pr. (oc | Sj — ai) 



Figure 6.16 The transition probability q lh is the area of the shaded region. 


restricted to being some letter of the quantizer output alphabet, say 
{6;,}, h = 1, 2, . . . , Q. Given that s is a t , we denote the probability that 
r/ is 6ft by the symbol q lh : 

<lih = P[r/ = K | s } = a t ]. (6.42) 

As shown in Fig. 6.16, the value of q lh for any particular quantizer is the 
integral of the Gaussian density function of Eq. 6.41 over the 6th quanti- 
zation interval. The set of probabilities {q lh }, / = 1, 2, . . . , A, h = 
1,2 , ,Q, specifies the probabilistic connections between the transmitter 
alphabet {a,J and the quantizer output alphabet {b h }. The {q lh }, called 
transition probabilities, may be conveniently displayed in a diagram (see 
Fig. 6.17a) when A and Q are small and in a matrix (see Fig. 6.176) when 
A and Q are large. 

The components of the Gaussian noise vector which the channel adds 
to the signal vector s are statistically independent. If we assume that each 




(a) (b) 

Figure 6.17 Transition probability diagram and matrix: (a) A — 3, Q = 4; ( b ) 
matrix; the elements in each row sum to one. In the interest of clarity all possible 
transitions are not shown in the diagram. 
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matched filter output r } , j - 1,2,..., N, is subjected to an identical 
quantizer, each component is affected independently by the same set of 
transition probabilities {q lh }. Thus if 

a = (cq, a 2 , . . . , a v ) 

Y = (7n 72’ ■ ■ • » 7.v)» 

then 

P[r' = y I s = a] = XJ p [>'/ “ Yi I s i = “#]• ( 6 - 43 ) 

l 

Here, each a f may be any member of the transmitter alphabet {a £ } and 
each y t may be any member of the quantizer output alphabet { b h }. For 
example, if 

a = (a 6 , a 2 > 

Y = (&s> K b iX 
then, in accordance with Eqs. 6.42 and 6.43, 

P[r # = y | s = a] = ft.s ? 2 , 7?s,i* 

Calculation of R 0 ' 

We are now in a position to calculate a bound on the mean probability 
of error for an ensemble of quantized receiver communication systems. 
We assume that the connection vectors for the parity-check encoders are 
EL and statistically independent over the ensemble. Consequently, the 
signals {sj are statistically independent by pairs: 

P[Si = a, s fc = p] = P[Sj- = a] P[s fc = [3]; all i, k ^ i. (6.44a) 
Furthermore, the components of each signal are statistically independent, 

P[s, = a] = II P[% = cc,]; i = 0, 1, . . . , M — 1, (6.44b) 

j-i 

and for each a L in the transmitter alphabet 

P[% = a ,] = pi\ all i,j. (6.44c) 

The {p,} depend on the choice of N' and of the digital-analog transducer 
of Fig. 6.8. 

Formulation of the bound . The derivation of the bound 

P[8] < 2~ NlR °'~ R ^ (6.45) 

is formulated in a manner identical to that we have encountered before. 
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We start with the union bound 

P[S | m k ] <i _1 p 2 [ Si , s*). (6.46) 

i=0 

U*k) 

By virtue of Eqs. 6.44 the expectation of P^s*, s*] over the ensemble of 
systems is independent of the subscripts i and k. Hence 

P^iTij = all 1, k * i (6.47a) 

and 

m = P[S i m k ] < MP 2 [6]. (6.47b) 

Since M = 2 xV/fN , we need only show that 

PJS] < 2“- V2 *° / (6.47c) 

to obtain the desired bound of Eq. 6.45. 

Verification of Eq. 6.47c and evaluation of R 0 ’ remain to be done. We 
recall that P 2 [s, : , s fc ] is the probability of error when s f and s ft are used to 
communicate one of two equally likely messages. Let a and (3 be two 
vectors with components in the transmitter alphabet {a t }, let y be a vector 
with components in the receiver alphabet {b h }, and assume for the moment 
that 

s f = a, s fc = (3, r' = Y- ( 6 - 48 > 

The optimum decoder in this two-message case makes an error when m k is 
the transmitter input and r' = y is received if and only if 

P[r' = Y I », = «)> P[r' = y | s fc = (3J. (6.49) 


(6.50) 


, N (6.51a) 


Equivalent forms of the error condition are 

- ln P[ r - = v | s , = tt] >0| 

P[r' = Y K = £] 

and, in view of Eq. 6.43, 

2 In p t r / = 7i 1 % =.gj] ^ 0 

?=i P[r/ = Ji | s ki = /5 ? ] 

We simplify notation by defining 

- A. t„ P I r / = Yi I S H = fr>r i — 1 


z, = In 


p [r/ = y,. | s kj = Pj] 


for j = 1,2,... 


3 = 1 


and 


(6.51b) 
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Given s* = a, s k = (3, and r' = y, an optimum receiver in our two- 
message case computes the number Z, and sets m = m k if Z < 0. Thus, 
when m = m k , an error is made if and only if 

Z > 0. (6.52) 

Ensemble averaging. Over the ensemble of codes and channel noise, 
we recognize that s it , s kJ , and r/ are random variables, with s i6 and s kj 
ranging over the transmitter alphabet (aj and r- ranging over the quan- 
tizer alphabet {b h }. Hence z } , which is uniquely determined for stated 
values of s ij} s kj , and /■/, is also a random variable. In particular, from 
Eqs. 6.42 and 6.51 we have 


(% = s kj = a t , r/ = b h ) =>(*, = In ; 

' Hlh' 1 


1 < u, l < A 
1 < h < Q. 


The probability assignment for z } follows from the statistical independence 
of s {j and s kj . We have 

?[ Si , = a u , s kj = a { , r/ = b h \m k ] 

= P[% = aj P[s M = a t ] P [r- = b h j m k , s kj = a J 

= PuPtflh- (6 - 54 ^ 

Finally, by virtue of Eqs. 6.43 and 6.44b we note that the random variables 
[z.},j = 1, 2, . . . , N, are statistically independent, hence that Z is a sum 
of statistically independent, identically distributed random variables. 

We shall bound P 8 [s„ s fc ], using the technique first introduced in the 
derivation of the Chernoff bound in Chapter 2. If we define the unit step 
function 

(1; Z>0 

/(Z) = (6.55a) 

[O; Z < 0, 

then, from Eq. 6.52, we have 


P 2 k,s fc 3 = P 2 [S ]=/(Z), 


(6.55b) 


where the average is over the joint ensemble of codes and channel 
disturbances. 

Evaluation of the bound . Direct evaluation of f{Z) is not, in general, 
possible. For the special case of no quantization, which was described in 
Chapter 5, the corresponding bound on P 2 [6] was expressed in terms of 
the function Q( ) rather than the unit step function /( ). The averaging 
was easily carried out after the substitution of an exponential bound on 
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Figure 6.18 Exponential overbound to the unit step. 


the Q( ) function. A similar strategy, which we now adopt, is to over- 
bound the unit step /(Z) by the exponential e xz . 

As shown in Fig. 6.18, 

/(Z) < e xz , for any ). > 0. (6.56) 

Substitution of Eq. 6.56 in Eq. 6.55b yields 

p M=nz)<^ 

r / N \i r - v 

= E exp uj IzA = E \J e 
L \ #- 1 / J u=i J 

= n ^ = [^]- V ; A > 0 , ( 6 . 57 ) 

i= i 

in which z denotes any one of the identically distributed, statistically 
independent random variables {z,}. Defining 

i?o'W=- log,? 7 , (6.58a) 

we have 

P^g] < 2~ yito ' v) ; 0. (6.58b) 

In the derivation of Eq. 6.58 we have exploited the fact that the 
random variables {%}, hence the random variables {e* z >}, are statistically 
independent; this enables us to equate the mean of the product of the 
and the product of their means. Indeed, the motivation for adopting 
the exponential bound of Eq. 6.56 is that it permits exploitation of this 
fact. 

The bound of Eq. 6.58b is valid for any l > 0. We now choose the 
parameter A in such a way that the bound is as tight as possible. From 
Eqs. 6.53 and 6.54, 

e 5 = 2 2 2 V.PAlU exp filnM 

u l h \ Qih ’ 

= 222 Papier ‘hi 

u l h 


(6.59) 
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where the indices u and / run from 1 to A and h runs from 1 to Q . We 
therefore seek the value of A for which 

0 = 4 ( e **) = 111 PuPrtlh* 9 uh 0 n q uh - In q !h ). (6.60a) 

dA u l h 

Since the right-hand side of Eq. 6.60a is symmetrical in the indices u and /, 
the solution to Eq. 6.60a is 

a = i (6.60b) 

which, of course, satisfies the condition A > 0. If we define ; 

R 0 ' = R 0 '(A) max = R 0 W = -logs 2 II P u Pi\jquh<hh > (6.61a) 

L u t h 

Eq. 6.58b becomes 

PJg] < 2"' VKo '. (6.61b) 

Substitution of Eq. 6.61b in Eq. 6.47b yields the desired result 

P[£] < (6.62a) I ! 

By virtue of the symmetry of Eq. 6.61a with respect to the indices / and u \ 

the expression for R 0 ' may also be written 

Ro =-log 2 2 ILPis/dih ■ (6.62b) 

Discussion . Equation 6.62 provides a bound on P[S] that is valid for 
any set of probabilities ( p For a given set of transition probabilities 
the {p J may be optimized by use of the formulas of Appendix 5C. 

Although Eq. 6.62 has been derived with reference to the quantized 
additive white Gaussian noise channel, its validity does not depend on the 
specific mechanism that produces the transition probabilities {q , k }. f For 
any discrete channel described by the diagram in Fig. 6.17a or the matrix 
in Fig. 6.176 we may communicate one of M = 2 NRn messages by means 
of a parity-check-encoded signal set (sj with components in {a { }. As long | 

as each component of s is affected independently by the transition proba- j 

bilities {q th }, the ensemble average error probability is bounded by 
Eq. 6.62. 

Increasingly Fine Quantization 

We now consider the limiting behavior of R 0 ' as the quantization grain j 

becomes increasingly fine. By so doing we obtain an exponential bound j 

<? 

t Since the are probabilities, they must satisfy the conditions </<„ > Oand^</ w = 1. 
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on P[S] that applies to unquantized-receiver vector channels that are far 
more general than those disturbed solely by additive Gaussian noise. 
Indeed, the conditional probability density function of any unquantized 
received signal component r j} given the transmitted signal component s } , 
may be quite arbitrary. We need only require that this conditional 
density function is continuous and the same for all components j = 1, 
2, ... y N, and that successive components of r are disturbed with statistical 
independence: 


Pr\s XT Prjjsj 


(6.63a) 

(6.63b) 


Pri\sj Pt\sj j — 1* 2 , . . . , N . (6.63b) 

If we quantize each component r } as shown in Fig. 6.15, then in accord- 
ance with Fig. 6.16 we have 


<hh = P T (y\s - a t )dy, 
J 


(6.64a) 


in which A h is the 6th quantization interval. When each A ?i is sufficiently 
small, Eq. 6.64a can be written 

Vih** P,( b h | ■? - tfj) A 7i , (6.64b) 

in which b h is now taken as the midpoint of the interval A, t . 

In accordance with Eq. 6.64b the expression for R 0 ’ in Eq. 6.62b may 
be written 

ru -i2 

R 0 ' ^ -log 2 J 2 PC J p T {b h I s = a { ) A h 
all h Li=l 

... r -a 

= -Jog 2 I Pi'j Pr(b h I s = a,) . 

all h L;.=i 

In the limit as the quantization grid becomes increasingly fine the sum on 
h becomes an integral and the approximation of Eq. 6.64b is exact. Thus 
for the unquantized case the conditions of Eq. 6.63 suffice to establish the 
general result 

Wl < 2-* [B »-*n 1, ( 6 . 6 5 a ) 

with 

R 0 =-log 2 ( Pi'Jpriy I s = oj) dy. (6.65b) 

J“COU=l 

For the particular case in which the disturbance is independent additive 
noise, r = ^ + n, this becomes 

r a ~i 2 

R 0 = -log 2 2 Pi\! Pniy - ad dy. (6.65c) 
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Equation 6.65c can be used to substantiate our intuitive contention that 
extremely fine quantization does not introduce degradation in attainable 
error performance. As an example, we consider a particular system with 
binary modulation and a quantized receiver operating over an additive 
white Gaussian noise channel. We compare the value of R 0 ', approxi- 
mated by the R 0 of Eq. 6.65c if the receiver is finely quantized, with the 
exponent obtained under the same conditions except that the receiver is 
unquantized. 

Let 

% = + -Je n 


and choose 


- 


Pi = P[s« = *i\ = 1,2. 


(6.66a) 


(6.66b) 


For unquantized Gaussian noise 




In accordance with Eq. 6.65c, we have 


, !/ 1 g -<y+V^> 2 /-*V| 2 Y dy 

2 \ n / 77 -JV , 0 / J 


, i p 1 

" g2 4J-coV7rJ^ 0 

X [e~ (y ~’' / - EN)2/J ' r ° + g^y+VsN) 2 /^ _|_ 2 e -<v 2 + E N>/ trf o] dy 


= -log 2 K1 + 1 + 2e- E ^) 

= 1 - log 2 (l + e- E " ,X °). ( 6 - 67 > 

This agrees with the unquantized-receiver result of Eq. 6.24b. 

The R 0 of Eq. 6.67 has been obtained via a bounding technique— 
f(Z) <T? Z — that at first appears quite weak. That this R 0 agrees for 
EJNq « 1 with Shannon’s optimum bound R 0 * may seem surprising. 
The agreement, however, is consistent with the statement in Chapter 2 
that the Chernoff bound is exponentially tight. 
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Comparison of Quantization Schemes 

We now apply the results of the analysis of R 0 ' to the evaluation of 
certain interesting quantization schemes. As usual, we assume that the 
transmission is corrupted by additive white Gaussian noise. 



Figure 6.19 Quantizer for binary symmetric channel; A = 2, Q = 2. 

Binary input , binary output. In the first case that we consider the 
transmitter alphabet consists of only two allowable input amplitudes, 

( 6 . 68 ) 

a z = - V E h . 

The matched filter output at the receiver is also quantized into two levels, 
as shown in Fig. 6.19. Thus A = 2, Q = 2, and the overall channel 
diagram is that of Fig. 6.20, in which 


4is = ? 2 i = P (6.69a) 

4u = 422 = 1 ~ P (6.69b) 


p = Q(V2£ n /jV 0 )- (6.69c) 

Figure 6.20 Transition diagram 

The transition diagram is that of a binary for binary symmetric channel. 
symmetric channel (BSC). Because of the 

symmetry of this channel, the probability assignment p x = p 2 = i is 
optimum. From Eq. 6.62b we then have 



K = -l°g 2 

Lz=i J 

= -iog 2 kw? + yr^7) 2 + (yp + yr^~pf] 

— 1 — log 2 [1 + 2Vp(l — p)\. (6.70) 
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Energy ratio per dimension, 10 log 10 E N /Jf 0 


Figure 6.21 R 0 and R 0 ' for binary antipodal signaling with two- and three-level sym- 
metric quantization. 


r' 


-J 


°2 

h 




+J 






Figure 6.22 Quantizer for null-zone channel, A = 2, Q ** 3. 
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The value of R 0 ' from Eq. 6.70 is plotted in Fig. 6.21 as a function, of 
E n IN 0 , together with the unquantized R 0 given by Eq. 6.67. We observe 
that the quantization loss is approximately —2 db. More precisely, in the 
limit E N /.JV > 0 -> 0 (hence p £) it can be shown that the loss in decibels is 
exactly 10 log 10 (2 /t t). 

Binary input , ternary output. A significant fraction of the degradation 
in R 0 ' resulting from binary quantization 
can be avoided by going to, a ternary 
output. For A = 2, Q = 3 the appro- 
priate quantizer is that shown in Fig. 

6.22 and the resulting over-all channel 
diagram is that of Fig. 6.23. We have 

4ia = 421 = P (6.71a) 

413 = 423 = w (6.71b) 

and 

4n = 42 2 = 1 — P - w, (6.71c) 
where p and w are given in terms of the quantizer threshold J by the 
equations 

p = f °°_* e -(v+VE N )*/X 0 dy (6 .72a) 

JJ ,/n\N 0 

—L= e - (v+VE ^/^ dy. (6.72b) 

-JJ**? 0 




Figure 6.23 Transition probabil- 
ity diagram for null-zone channel. 


Such a channel is called either a null-zone channel or a binary symmetric 
erasure channel (abbreviated BSEC). By symmetry we again choose 
P 1 —P 2 — h Then, from Eq. 6.62b, we have 


3 r 2 -12 

Ro = ~log 2 2 2,piy/q lh 
h—1 Ll=l ■ J 

= -logJGV 1 - P - W + \yjpf + (W w + W w ) 2 
+ (Wi ~p-w + iv p)] 2 

= 1 - log 2 [1 + w -I- 2y/p(l - p - w)]. (6.73) 


The value of R 0 ' given by Eq. 6.73 is a function of the quantizer 
threshold value, J. The optimum value of J (the value that maximizes 
Ro ) can be found as a function of by trial and error; it is plotted 

in Fig. 6.24. The value of R 0 ' resulting from Eq. 6.73 when J is optimum 
is plotted as a function of £ N /.JV' ) 0 in Fig. 6.21. We observe that the 
degradation from the unquantized case is roughly 1 db and conclude that 
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Energy ratio per dimension, 10 !og 10 E N /Jfo 


Figure 6.24 Optimum threshold for null-zone channel. 

little improvement can be gained by quantizing to more than three levels 
when j? N /Jf 0 is sufficiently small that signaling with seque nces o f binary 
waveforms is efficient. Note in Fig. 6.24 that J = 0.65 \jX 0 /2 is near 
optimum over this interesting range of EJX 0 . 

Multiamplitude inputs. Quantization at the receiver also implies a 
degraded R 0 ' for systems that employ a multiamplitude modulator to 
exploit a high energy-to-noise ratio per dimension. For a given input 
alphabet (aj and a given quantization grid the first step in evaluating the 
degradation is to determine the transition probabilities {< q lh } in accordance 
with Fig. 6.16. The second step is to substitute these {q lh }, together with 
an appropriate choice of letter probabilities { pi }, into the expression 

R 0 ' = -log 2 f hr p ls jq lh . (6-74) 

Lf=i J 

We now apply these results to a particular ensemble of systems operating 
over an additive white Gaussian noise channel. Each system utilizes a 
modulator with transmitter letters {«;} equally spaced over the interval 
[—sjEfi, +'JEn\ an ^ a receiver with a uniform quantization grid sim- 
ilar to that shown in Fig. 6.25 for A « 6; the number, Q, of quantizer 
output levels is equal to the number, A, of transmitter letters. 

Curves of R 0 ' as a function of EJN 0 for Q = A = 2, 3, 4, 8, 16, 32, 
and 64, calculated on a computer, are plotted in Fig. 6.26. In each case 
the letter probabilities {p t } have been set equal to IjA. For reference, the 



o 

Figure 6.25 Uniform quantization, Q — A. 
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upper envelope of the curves of the unquantized exponent R 0 is replotted 
from Fig. 5. IB. This upper envelope specifies the performance obtainable 
in the absence of quantization with A and (pj optimized. 



Energy ratio per dimension, 10 log 10 E N /X 0 

Figure 6.26 R 0 ' for /4-level amplitude modulation, quantized receiver ( Q = A). 

We observe from Fig. 6.26 that the best choice of A depends on the 
value of E H jN 0 . By choosing A (as a function of EJN 0 ) to maximize R 0 ', 
we can operate along the upper envelope of the R 0 ' curves. It must be 
recalled, however, that fine quantization increases the cost of the decoder. 
Thus it is desirable to select A only large enough to yield an efficient set 
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of waveforms; that is, only large enough to prevent R 0 ' from saturating 
at log 2 A. When this is done and Q is set equal to A, the resulting degra- 
dation due to quantization is approximately 2 db over the full range 
of EJX 0 . 

The fact that R 0 ' decreases as A increases when EJX 0 is small is 
attributable to choosing uniform {fi}', for example, when E N j.K‘ a is less 
than 5 db and A = 64, R 0 ' falls about 5 db to the right of i? 0 . Almost all 
of this discrepancy is due to the fact that with uniform {p t } the mean 
squared-length of a signal vector, hence the mean energy of a signal 
waveform, approximates NE u /3 (see also Eq. 5.57 et seq.). This dis- 
crepancy would be eliminated if the [pi } were optimized: the letters 
+Vi^ and would be used almost exclusively when EJN 0 is 

small and the resulting mean signal energy would be three times greater. 
For optimum {p t } and Q = A, the R 0 ' curves increase monotonically 
with increasing A for all EJX 0> and approach the unquantized envelope 
as A -> co. 


Uncoded transmission. We now contrast the performance afforded by 
coded systems with the performance obtainable in the absence of coding. 
If the transmitter employs A amplitude levels equally-spaced between 
and to communicate M = A equally-probable messages, 

and if the uniform quantizer of Fig. 6.25 is used for making decisions, then 
the resulting probability of error is 


i A ~ 1 4 — 1 / 

p[8] =i?o CI “ ? " )= V 2Q (7 


while the rate is 


i? N = logg/4 bits/dimension. 


Points are included on Fig. 6.26 to indicate for each value of A the rate 
and the value of E n IN „ necessary to achieve P[&] = 10“ 5 andP[£3 = 10~ 10 . 
It is observed that for all A coding affords an increase of between 2 and 
3 db in the efficiency of energy utilization for P[S] — 10 5 and between 
6 and 8 db for P[6] = 10 -10 . 

We may conclude initially that coding for a high signal-to-noise ratio 
Gaussian channel is not dramatically rewarding. However, it is wise to 
recall from the central limit theorem discussion that the assumption of 
Gaussian statistics may be very poor on the tails of the distribution; in 
particular, the probability of an atypically large noise may be orders of 
magnitude larger than that predicted by the Gaussian model. Con- 
sequently, it is doubtful that the performance of uncoded systems on 
physical channels will actually approach the performance predicted in 
Fig. 6.26. Of course, to some extent this same caution also applies to 
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coded systems. With coding, however, a low error probability is attained 
by observing many noise samples rather than only one. For no single 
sample must the probability of a large noise be vanishingly small. Thus 
system performance with coding is less sensitive to the tails of the noise 
distribution than system performance without coding, and the Gaussian 
approximation is more tenable. 

6.3 BINARY CONVOLUTIONAL CODES 

The calculations of the preceding section have shown that it is possible 
to choose a receiver quantization scheme in such a way that the achievable 
error exponent R 0 ' is degraded only slightly from the value it would have 
without quantization. In making these calculations, w'e have presumed 
that the quantizer is followed by an optimum decoder — that is, by a 
computer that determines for which message m t the a posteriori proba- 
bility P [m { | r'j is maximum, where r' denotes the quantizer output vector. 

For equally likely messages the a posteriori probability is proportional 
to P[r' | sj. But we have observed before that it is not possible in practice 
to compute P[r' | sj for every i when K is large and M — 2 K — 2* V ^ N 
enormous. The decoding problem is to avoid exponential growth in decoder 
complexity as K increases. Additional degradation in the error exponent 
results from the necessity of settling for nonoptimum data processing in the 
box labeled “Decoder” in Fig. 6.14. The remainder of this chapter is 
devoted to exploring ways in which the degradation in error exponent 
caused by decoder data processing may be made a small percentage 
of Rf 

An ideal decoding scheme would have the following attributes : 

1 . The probability of error would decrease exponentially with increasing 
code constraint length K in agreement with the random coding bound. 

2. The size of the decoder would be proportional to K. 

3. The required computational speed of the decoder would be inde- 
pendent of K. 

Unfortunately, so far no scheme exactly satisfying all three conditions has 
been devised. 

In spite of this, diverse approaches 31 ’ 57 • 66 • 96 to decoding have met with 
significant success and can provide workable engineering solutions to 
practical communication problems. We shall focus in particular on one 
approach, called sequential decoding , which evidences operating char- 
acteristics that in some regards approximate the ideal. 

Sequential decoding procedures are applicable to a subclass of parity- 
check codes called convolutional codes and to a broad class of channels. 
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It is easiest, however, to convey the central ideas and methodology by 
concentrating attention on the binary symmetric channel (BSC). 


The Binary Symmetric Channel 

The BSC has already been encountered in connection with Fig. 6.20; 
it is. obtained from the additive white Gaussian noise channel by restricting 
the transmitted signal vectors (sj to lie on vertices of a hypercube and 

symmetrically quantizing the components 
i-p of the relevant received vector into two 

levels. It is conventional to denote both the 
input and the output alphabets of the re- 
sulting discrete channel by the symbols 
(0, 1}. With this notation, the channel 
transition diagram of the BSC is that of 

Figure 6.27 Binary symmetric Fig. 6.27, in which;, denotes the probability 
Channel. that any particular component of the trans- 



1 -p 


mitted vector will be received incorrectly. 

We may think of communicating over the BSC by feeding the vector 
y (with components that are 0, 1) directly from the output of a parity-check 
coder into the channel, as shown in Fig. 6.28. The channel output vector 
r' is fed in turn directly into the decoding computer. The effects of the 


Figure 6.28 Communication over a BSC. 


transmitter modulator, the Gaussian channel noise, and the receiver 
quantizer are all coalesced into the BSC transition probability parameter/?. 

When one of M equally likely messages is communicated over the BSC 
by means of a set of iV-component binary vectors (y f }> the optimum 
receiver compares the ^-component received vector r' with each of the 
(y £ ) and determines for which i the probability P[r j yj is maximum. The 
probability that a transition occurs with any single use of the BSC is p and 
successive uses of the channel are statistically independent. Therefore, 
whenever r' and differ in d t coordinates, 

P[r' I y<] = P ds q N ~ di 

? = l-p. (6.75) 

The quantity d { is called the Hamming 39 distance between r' and y*. 
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The right-hand side of Eq. 6.75 is a monotonically decreasing function 
of d i for p < Accordingly, the optimum BSC decoder may determine 
m by computing the set of Hamming distances {< d J, / = 0, 1, . . . , M — 1, 
and setting m = m k whenever d k is the smallest member of the set. 

Since the vector r' © y* contains a “1” only in coordinates in which y { 
differs from r', the Hamming distance d i is conveniently obtained by form- 
ing r' © y t - and counting the resulting number of l’s. By convention, the 
number of l’s in any binary vector a is called its weight, denoted vv[a]. 
With this notation, 

d t = w[r' © yj. (6.76) 

Insight into the problem of communicating over a BSC is gained by 
formulating the decision problem in geometrical terms analogous to those 
with which we are already familiar. We begin by reviewing the additive 
white Gaussian noise channel with binary transmission and unquantized 
reception. In this case the modulator in Fig. 6.1 is capable of generating 
any one of the 2 N hypercube vertices of an JV-dimensional signal space. 
For rates R N = KjN < 1 , the coder specifies a subset of 2 K of these ver- 
tices as the signal set (sj, i — 0, 1 , . . . ,2 K — 1 . When the input messages 
are equally likely, the optimum unquantized receiver sets m = m k if the 
received vector r lies closer in Euclidean distance to s fc than to any other 
signal vector; that is, if |r — s fc | is minimum. 

When quantization is imposed on the matched filter outputs, the decoder 
must make the decision m on the basis of the quantized output r', without 
recourse to r itself. We interpret this decision geometrically by first 
observing that the symmetric binary quantization of r corresponds to a 
mapping of r into whichever hypercube vertex, say v, is closest to r. For 
R n == K/N = 1 and equally likely messages, the vertex v would itself 
correspond to the most probable transmission (we first observed that 
such dimension-by-dimension decisions were optimum for i? N = 1 in 
Section 4.5, Eq. 4.90). For R N < 1, however, the vertex v may not be a 
signal vector; when N is large, the fraction 2 -lV(1-iiN> of vertices that 
are signal vectors is very small. The task of the decoder is to map v onto 
one of the signal vectors {sj. If v differs from a vector {s 2 -} in h t co- 
ordinates, then 

\y- Si \=2h i ^T H , (6.77) 

where £ N is the energy per component. 

In BSC notation (with vector components 0, 1 rather than + \Je u , 
—V E h ), the vertex v corresponds to the BSC output r', the signal s ^ to the 
BSC input y,-, and the number of coordinate differences h ( to the Hamming 
distance d t . In accordance with Eq. 6.75, the optimum decoder minimizes 
d t , hence |v — s € |. Symmetric binary quantization followed by optimum 
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decoding thus corresponds to the two-step procedure of first minimizing 
|r — v| and then minimizing |v — s £ |. That the two-step decision may 
not be optimum over-ail is illustrated by Fig. 6.29, in which a vector r is 
mapped onto v 5 by the quantizer and thence onto s x by the optimum 
quantized decoder, even though |r — sj > Jr — s 0 |. 


v 2 v 6 



Figure 6.29 Received vector r for 
which 6 = 2 quantization followed 
by optimum binary decoding does 
not select signal with minimum 
Euclidean distance. 



2 y points represents a distinct jV-component 
binary vector. The circled points correspond 
to signal vectors. 


Of course, a binary symmetric channel may also be derived from 
quantization of channels that are not Gaussian and is a valid and interesting 
mathematical abstraction in its own right. It is therefore instructive to 
introduce a signal space formulation for the BSC that is not tied to 
Euclidean hypercube geometry. In particular, we can view the set of all 
possible (iV-component) received vectors {r'} with component values 0 or 1 
as the 2^ points of a discrete signal space (see Fig. 6.30). The M vectors 
(y.) define a subset of these points and form a signal constellation with 
intersignal Hamming distances { d ik } given by 

d ik = wfy* © y fe ]; 0 < i, k < M — 1. (6.78) 

The effect of transmitting a signal vector y over a BSC may be described 
in terms of a random noise vector, n = (n x , « 2 , . . . , n N ), defined by 

n = r'©y. 

From the definition, any component n } of n is 1 if a transmission error 
occurs on the /th use of the channel and 0 otherwise. If m = m k , we have 

r' = n © y k (6.79a) 


i 


and 

di = w[(n © y fc ) © yj. (6.79b) 

A complete analogy exists between the BSC and the white Gaussian 
noise decision problems. In both cases we have a constellation of signal 
vectors and an additive noise vector. The primary distinction is that with 
the BSC addition is modulo-2 and distance is measured in terms of weight 
rather than length. The utility of the analogy rests on the fact that in both 
cases “probability” is monotonically related to “distance.” For example, 



Figure 6.31 Decision regions {/,} for a BSC communication system. 


the optimum BSC receiver again partitions the received signal space into 
M disjoint decision regions {/*-}, as shown in Fig. 6.31. When every 
message is equally likely, each region h>k = 0, 1, . . . , M — 1, contains 
all points in the received signal space that lie closer in Hamming distance 
to y k than to any other vector in the signal set (y,}. 

Convolutional Encoders 

Convolutional codes for use over a BSC may be generated by encoding 
devices, like that diagrammed in Fig. 6.32, which are somewhat simpler 
than block coders. Just as in the block coder of Fig. 6.6, we have a K- bit 
^-register. But there is no ^-register, and instead of N modulo-2 adders 
we now have only v of them, where v is typically quite small. 

The connection diagram of the encoder is specified by a set of co- 
efficients {g^}, l = 1,2, , K and j = 1, 2, . . . , v. As with block 
coders, g tj = 1 means that the /th stage of the ^-register is connected to 
the /th adder, whereas g tj = 0 means that it is not. We again find it 
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convenient to write the set of connections to the /th ^-register stage as a 
vector, say g t : 


g l — <SU> Sl2y • • • J 8lv ) i 

l 

= 1, 2, . . . , K. (6.80) 

As an example, for the particular K = 

4,u 

= 3 encoder of Fig. 6.33, 

8i = <1> 

1, 

1) 

g 2 = (o, 

1, 

0) 

g 3 = (o> 

1, 

1) 

g, = (0, 

1, 

1). 


A convolutional encoder operates as follows: assume that we wish to 
communicate an L-bit message vector 

x = (x lt x 2 , . . . , x L ), (6.81) 



Figure 6.33 A particular K = 4, v = 3 convolutional encoder. 


in which L may be greater than K . First, the contents of all K stages of 
the ^-register are set equal to zero. Next, the first digit, x lf of x is shifted 
into stage 1 of the cc-register. The v modulo-2 adders are then sampled one 
after the other by the commutator shown in Fig. 6.32 and presented to 
the input of the BSC for transmission. When the wth-adder output has 
been sampled and transmitted, the second message digit, x 2 , is shifted into 
stage 1 of the ^-register, which causes x 1 to shift into stage 2. Each of the 
v modulo-2 adder outputs is again sampled and transmitted. This pro- 
cedure continues until the last component, x L , of x has been shifted into 
stage 1 of the ^-register. Then, with each adder output still being sampled 
and transmitted after each shift, K 0’s are fed in turn into the z-register, 
thereby returning it to its initial condition. During each shift the digit 
forced out of the Eth stage of the x-register is discarded.! 

The output sequence produced by an E-component input vector is 
( L -f K)v digits long. We denote this output sequence by a vector y. 

As an example of the convolutional encoding procedure, reconsider the 
(K — 4, v = 3) coder of Fig. 6.33. It may be verified directly that if the 
message input is the 5-bit sequence 

x = (1, 0, 1, 1,0), 

the encoder output sequence is 

y « (111, 010, 100, 110, 001, 000, 011, 000, 000). 

(For clarity, commas have been used to indicate the shifting of the 
a-register and deleted elsewhere.) 

In application, one is usually concerned with a message input vector x 
that is much longer than the ^-register, that is, L » K. In such a case the 
tail of zeros added to x is much shorter than x itself and the ratio of the 
number of message digits to the number of transmitted digits is approxi- 
mately 1 /v. We therefore define the rate of a binary convolutional code 
of the type described! as 

R u = - (bits per transmitted symbol). (6.82) 

v 

Each message input digit remains within the .^-register during K 
samplings of the modulo-2 adders, hence affects Kv transmitted digits. 

f It would suffice to introduce only (K — 1) 0’s into the ^-register following the last 
digit of a message, since this last digit is shifted out of the register when the first digit 
of a new message is shifted in. 

$ It is also possible 90 to generate convolutional codes of rate An = ujo, where u is 
any positive integer less than v. 
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We hereafter refer to K as the constraint span (measured in message input 
bits) of a convolutional code. 


Terminal nodes 



Figure 6.34 A set of 2 L = 16 input vectors {x,-> diagrammed on an input tree. 

Tree structure. We now consider in more detail how a convolutional 
coder constructs the output codeword y from the input 

x = (aq, x 2 , ... , x L ). 

Since the ^-register is initially set to zero, the first v digits of y, obtained by 
shifting the first component of x into stage 1 and sampling the v adders, 
depend only on x v Similarly, the second v digits depend only upon and 
* a . In general, the v digits of y obtained immediately after shifting 
component x h into the ^-register depend only on x h and the (K — 1) 
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components of x preceding x h . This implies that if two input vectors agree 
in their first (h — 1) coordinates the corresponding output vectors agree 
in their first (h ~ l)v coordinates. 

The resulting structure of the set of all output code words may be 
placed in evidence by means of a code tree, obtained as follows : first, as 
shown in Fig. 6.34, the set of all 2 L L-component input vectors {xj is 
diagrammed on an input tree by adopting the convention that the upper 
branch diverging from any node of the tree corresponds to shifting a 0 into 
the re-register and the lower branch to shifting in a 1. Thus each input 
vector x, : designates a distinct path all the way through the input tree to 
one of 2 L terminal nodes. The association of the (xj and the paths is 
indicated in the figure. 

Next consider any intermediate node of the input tree : the path leading 
up to this node designates the contents of the ^-register just before a new 
input digit is shifted in, and the contents of the re-register immediately 
thereafter determines the next v digits of y. Thus we may associate with 
the upper branch stemming from each intermediate node of the input tree 
the v output digits that are generated when this new input digit is a 0 and 
with the lower branch the v output digits that are generated when the new 
input digit is a 1 . The code tree is obtained by writing along each branch 
of the input tree the v digits of y associated therewith. For example, the 
code tree generated by the particular K = 4, v = 3 convolutional encoder 
of Fig. 6.33 is illustrated for input sequences of length L = 5 in Fig. 6.35. 

We can interpret the message input x as a set of L successive instructions 
that tell the encoder which path of the code tree to follow. The trans- 
mitted vector y is the sequence of (L + K)v binary digits that lies along the 
designated path. 

Linearity. Additional insight into the structure of the code tree may 
be gained by exploiting the fact that the convolutional encoder of Fig. 6.32 
is a parity-check device; the output of each modulo-2 adder at any instant 
is 0 if the number of l’s stored in the stages of the ^-register to which the 
adder is connected is even and 1 if the number is odd. Just as for block 
parity-check coders, a convolutional coder is linear in the sense that when 
x = (cc l5 x 2 , , Xj) is the coder input the output y may be written 

y = *ifi © x 2 i 2 © • • • © x L f L . (6.83) 

Equation 6.83 is similar in form to the block parity-check code relation- 
ship of Eq. 6.28, 

y = *1*1 © *2 f 2 © ■ • • © 

in which each f ft , h = 1 , 2, . . . , K, is an iV-component vector describing 
the connections between the N modulo-2 adders and the /zth stage of the 
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Output digits due to ^ 

< "coder input "tail" of K 0's 


Figure 6.35 Code tree for encoder of Figure 6.33 (K - 4, L = 5, v - 3). 

JC-bit ^-register shown in Fig. 6.6. But for convolutional codes the inter- 
pretation of the {f ft } is different ; they are not connection vectors. 

The identity of the convolutional {f k } is readily established by no mg 
in Eq. 6.83 that when the L-bit input vector x has x h =1 and all other 
components equal to zero the output vector y is f h . In particular, by letting 


x = (1, 0, 0, . . . , 0) 
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and recalling from Eq. 6.80 that the vector g £ designates which of the 
v modulo-2 adders are connected to the /th stage of the convolutional 
x-register, we identify f x from Fig. 6.32 as the resulting output 


y = fi = (gi, g 2 , • • • - g k > o 1 o L ^_j). 

LO^s 


(6.84a) 


Here we have used the symbol 0 to denote the u-component vector, each 
component of which is 0. The commas again denote the points at which 
the ^-register shifts right. 
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Figure 6.36 Diagram of the {f/,} for a convolutional coder with K = 4, L = 6. Empty 
slots contain v zeros. (A tenth column of all zeros is omitted.) 

It is clear from .Fig. 6.32 and the description of the convolutional 
encoding operation that the output y when only the second digit of x is 
1 is a delayed replica of the output when only aq = 1. Thus 

x = (0, 1 , 0, . . . , 0) 

implies 

y = f 2 = (0, g x , g 2 , . . . , g g , 0, . , . ,0) . (6.84b) 

(z.- do’s 

If we delete explicit mention of the vectors (0), the (f ft ) may be described 
pictorially as shown in Fig. 6.36 for the case K = 4, L = 6. We note from 
observation of the Ath column of the figure that the v digits of y produced 
when x h is first shifted into the ^-register are 

*/ 4 gi © © • ” © %-s+igzo 


(6.85) 
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which is an expression with indices of convolutional form. (We define 
x_ t = 0 for all / > 0.) 

It is convenient to refer to the ^-component vector 

g = (gi, g 2 g k> ( 6 - 86 ) 

as the generator of the convolutional code; clearly, g completely specifies 
the coder connections. The nonzero segments of the {f/j} are just succes- 
sive y-digit translates of g. 

Error Probability 

For any particular choice of g, the output of a convolutional coder is the 
vector y specified in terms of the {f ft } and the input x by Eq. 6.83. Thus the 
set of 2 L possible outputs (yj can be generated not only in the way that we 
have described but also (in accordance with Eq. 6.28) by a block parity- 
check encoder that accepts input vectors of length L and generates output 
vectors of length ( K + L)v. The convolutional (fj of Fig. 6.36 would be 
the connection vectors of this equivalent block coder. 

Although convolutional codes for input vectors of fixed length L may 
be thought of as a special form of L-bit block codes, it does not follow that 
convolutional codes exist for which the attainable error probability obeys 
the ensemble block-code bound of Eq. 6.62, 

P[8] < 2~ MKo ' -Kn1 , 

with N = (L + K)v. The proof that this bound is valid for block parity- 
check codes depends strongly on freedom to choose the block-coder 
connection vectors arbitrarily, a freedom of choice that is not available 
when the {f, t } are constrained to have convolutional form: with uncon- 
strained block codes, each component of the input vector x can affect 
any component of the entire output vector y, whereas with convolutional 
codes each component of x can affect only Kv components of y. 

Because of the constraints on the (f ft ), we do not anticipate that the 
error probability with convolutional codes can be forced toward zero with 
an exponent that is proportional to ( L + K ). Indeed, intuition correctly 
informs us that the probability of at least one error in a block of L input 
digits must tend toward 1, not 0, if L is increased while K is held fixed. 
On the other hand, it is reasonable to anticipate a bound on error proba- 
bility that decreases exponentially with an exponent that is linear in the 
code constraint span K. 

With convolutional codes, it is difficult (perhaps impossible) to employ 
random-coding arguments directly to analyze an optimum decoder; this 
is because of the way in which successive digits of x affect overlapping 
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segments of y. Recognizing this, we consider a suboptimum decoder 
instead. For this decoder, we shall derive an ensemble error probability 
bound that does decrease exponentially with K. The suboptimum decoding 
procedure provides preliminary insight into sequential decoding. 

Suboptimum decoding. For a convolutional coder used to communi- 
cate one of 2 L binary input vectors (xj over a BSC, the suboptimum 
decoding procedure with which we are concerned is described as follows. 
The decoder decides on each of the L components x lt x 2 , , x L of the 
coder input vector in turn, one after the other, to produce a sequence of 
decisions aq, x 2 , . . . , x L . Each decision x h , h = 1, 2, , L, is based 
exclusively on (a) the previous decisions x u x 2 , . . . , and (b) the 

Kv- digit span of the received vector that is directly affected by x h . We 
refer to this span of received digits as the (^-component) vector h r, and 
to the intermediate code-tree node specified by x 1} x 2 , . . . , as 
the / 2 th starting node. Each x h is decoded in turn by determining which 
one of the 2 K if-branch codeword segments that diverge from the / 2 th 
starting node is the most probable cause of h r. In view of Eq. 6.75, the 
decoder calculates the Hamming distance between each such JiTy-digit code- 
word segment and h r. If the codeword segment with the smallest distance 
leaves the /?th starting node along the upper branch, the decoder sets 
x h = 0; otherwise it sets x h =1. A typical decoder progression is 
illustrated in Fig. 6.37 for a convolutional code with K — 3. 

We first consider any particular convolutional code and devote the next 
few subsections to bounding the probability, P[8], that at least one error 
will be made by the suboptimum decoder in the decision sequence 
& lt £ 2 , . . . , x L . Denote by P[8 ;i ] the conditional probability of an error 
on the / 2 th decision, given that the Ath starting node is correct. We bound 
the (unconditioned) probability P[S] by deriving the sequence of equations 

I. P[S]<|p[6„], 

ft-l 

II . pisj = mil 

hi. P[£J = P[Si I x„], 

in which the condition on the right-hand side of III indicates that the 
all-zero message sequence is transmitted. It follows immediately that 

P[g] < L P[8 X | x 0 ]. 

Finally, we average both sides of this equation over an ensemble of 
communication systems, each of which uses a different convolutional code, 
to establish the bound 

IV. P[8] < L2 -ac 
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Figure 6.37 Progression of suboptimum decoder for K — 3; the segments of the 
received vector t' for determining x e , and x 3 are ir, 2 i, and 3 r. (!) is the starting node, 
and the box labeled I encloses the pertinent codeword segments, say fiyd, for deter- 
mining *,. If*! = 1, © is the starting node, and the box labeled II encloses the perti- 
nent codeword segments, say {,y ( }, for determining x 2 . If x v = 1 and * 2 = 0, ® is the 
starting node, and the box labeled II L encloses the pertinent codeword segments, say 
( 3 y,->, for determining x 3 . 
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Equation IV is a bound on the mean probability that one or more 
errors will be made in decoding an L-bit message input sequence with 
convolutional coding, a BSC, and the suboptimum decoder that we have 
described. The code constraint length is K = NR n = N/v, and R 0 ’ is 
the BSC error exponent of Eq. 6.70, 

R 0 ' = l- log* [1 +2V/>(1 -p)]. 

This performance is comparable to that achievable by block coding 
with constraint length K and optimum decoding, for which the union 
bound on the probability of correctly decoding LjK successive blocks is 

PjT] < - 2 ~‘ vt (6.87a) 
K 

The difference in the tightness of the two bounds is not exponentially 
significant. This is evident when the convolutional bound, IV, is re- 
written in the form 

Pp] <• L 2--V[Ro'-K N ]+>°« 2 IC 


f 2-A'I n- W A') log 2 ( N/vn 

K 


(6.87b) 


As N gets large, (1 /N) log 2 N approaches zero and the bounds of Eqs. 6.87a 
and b are substantially equivalent.! 

The proofs of Eqs. I, II, III, and IV that follow are somewhat detailed 
and may be omitted on a first reading. 

Proof of /. To prove I, assume initially that a magic genie directs 
the decoder to the correct starting node for determining each £ h , h = 
1,2 , ,L. By definition, the probability that x k is then incorrect is 
PpJ. Employing the familiar union argument, we overbound the 
probability of one or more errors in L successive decisions of the genie- 

L 

aided decoder by ^ P[£J- 

k~i 

Next we observe that in the absence of decoding errors the starting node 
for each x h is correctly determined by preceding decoder decisions. If no 
errors are made with the genie, no errors are made without him! Since 
the converse is also true, the probability of at least one decoding error is 
unaffected by the presence or absence of the genie and 

P[S]<|p[SJ. ' (6-88) 

»= i 

t It may be argued that convolutional codes should actually afford error performance 
superior to that of block codes, but we do not know of any proof that this is so. 
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Figure 6.38 Construction of codeword segments ; = 4. From Eq. 6.83, 

y = xjj © X 2 f 2 © • • • © X L i L . 

(a) In determining we consider only codeword segments having the form 

x y = xJi ® x tW © • • • © X K^K 
= first Kv digits of y. 

The {f A '} are the portions of the {f*} enclosed within the box labeled I. (b) Codeword 
segments pertinent to determining x 3 depend only on the portions of the {f A } enclosed 
in boxes III and III'. The portions of the {f h } determining „a are enclosed in box III'. 


fef Proof of II. Proof that the probability of decoding x h incorrectly 
is the same for all h, given in each case the correct starting node, hinges on 
the structure of the code tree. Denote the Kn-digit codeword segments 
that enter into the decision x x by the special symbols {,yj, / = 0, 1, . . . , 
2 X - 1. The particular codeword segment x y k is transmitted when 
(x 1} x 2 , . . . , x K ) is the binary representation of the number k : we write 

iY = lYfcO (*1, **•••> x k ) = k - ( 6 - 89 ) 
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The construction of the i£o-digit transmitted segment x y may be identi- 
fied with the help of Fig. 6.38a as 

iY = *ifi' © X - 2 U © “ • © (6.90a) 

in which we have introduced the definition 

f h ' = the first Kv digits of f /; ; h — 1, 2, . . . , K. (6.90b) 

Equation 6.90a follows from Eq. 6.83 by truncating both sides, hence each 
f h , after Kv digits. We observe that f/ = g, where g is the generator 
sequence of the convolutional coder. 

In similar fashion we next denote the set of Kv - digit codeword segments 
that enter into the determination of x h — given the correct starting node— 
by i = 0, 1, . . . , 2 I{ — 1. As before, the particular segment h y k is 
transmitted when (x h , x h+1 , .... x h+K _ 1 ) is the binary representation of 
the number k : 

u y = hy k o( x h> x h+ 1, • • • > *h+k- 1) = *• ( 6 - 91 ) 

It is clear from Fig. 6.386 that for any h 

h y i = iYf © i = 0, 1, ...» 2* — 1, (6.92) 

in which A a is a binary vector, independent of /, which is determined by the 
input bits that precede x h into the encoder. Knowledge of the 6th starting 
node implies knowledge of , t a. 

Now consider the decision x h , given that the correct starting node is 
known. When the 6th segment is transmitted, the suboptimum receiver 
compares the received segment 

h r = h y k © *n 

= iY* © h* © (6.93a) 

with each of the possible transmitted segments 

iJi — iY< ©ft a ; i = 0, ,2 K — 1 . (6.93b) 

Here, ,,n denotes the Kv channel noise digits that are present in h r. But 
the decision x h is unaltered if the known vector ft a is added modulo-2 (a 
reversible operation !) both to h r and to each of the {^yj. Let this be done. 
It is clear from Eqs. 6.93a and b that the decision x h , now based on com- 
paring Gy* © ft n) with each of the GyJ, has the same error probability as 
the decision when jy* is transmitted. It is only necessary to use the 
fact that the channel noise is stationary, so that 

P[ ft n = a] = P[ x n = a]; for all iCy-digit binary vectors a. (6.94) 
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We therefore have 

I 0*71.5 X h+\’ ■ ■ ■ }. X h+K- 1) = 

= P(£ x | (x x , x 2 , ... , x K ) = k]; all h, k. (6.95). 
If all encoder input vectors are equally likely, then 

P[( a 'A> %+l> • • - » x h+K-l) — &] 

= P[(*i, x 2 , ... , x K ) — k ] ; all k, (6.96) 

and both sides of Eq. 6.95 may be averaged to yield 

P[6J = >[U (6.97) 

which is Eq. II. In the next subsection we show that 
P[8 X | (x lt x 2 , = k] 

is independent of k. It follows that both sides of Eq. 6.95 are independent 
of k, and Eq. 6.97 remains valid even when the (xj are not equally likely. 

The proof of Eq. 6.97 depends heavily on the fact that the correct start- 
ing node for the decision £ h , hence ?( a, is known. If the hi h starting node is 
incorrect, ft a cannot be correctly accounted for, and the probability that 
x n will be incorrect becomes large. We discuss this property at the con- 
clusion of the chapter. 

3g(f Proof of III. We now show that for our suboptimum receiver, 

P[Si] = P[8 X | x 0 ] = P[8 X | xj ; all k. (6.98) 

Proof depends on a closure property of the truncated codeword segments 
{ x yJ. As with block parity-check codes, the linearity of Eq. 6.90a 

iy = x fi © © • • • © x K t K ' 

ensures that the modulo-2 sum of any two vectors in { x yJ is also in Gy,.}. 
For convolutional codes we also have the following stronger closure 
property : 


Let S 0 denote the subset of all 2 K_1 vectors in { 1 y 1 } consistent with 
x 1 — 0, and let S l denote the subset consistent with aq = 1. Thus S x 
encompasses all x y t that contain f/, and S a encompasses all jy.,- that do not. 
Since f x ' © f/ =0, we have 


i y fc in S 0 and in S Q 
iYfc in S x and iy,. in S lt 


=> Gy k © iy f ) in S 0 , 


(6.99a) 


whereas 


iYk in Si and x y< in S 0 
iYjc in S 0 and x y 4 in S h 


=> (iYk © !>'<) in S v 


(6.99b) 


ERROR PROBABILITY 423 


Furthermore, if the vector x y* is in S Q , then Gy* © u) ranges through every 
vector in S a as u ranges through every vector in S 0 . On the other hand, if 
dfk is in S l? then Gy* © v) ranges through every vector in S 0 as v ranges 
through every vector in S x . * 

The suboptimum decoder determines which vector in GyJ is closest in 
Hamming distance to x r and sets = 0 if this vector is in S 0 and x x = 1 
if it is in Sy Thus, if the signal t y k actually transmitted is in S 0 , the 
decision x l is correct unless, for some vector v in S lt 

M(in © x y fc ) © v3 < w[( x n © iyfc ) © u] ; all u in S 0 . 

Here, as in Eq. 6.76, w[ ] denotes Hamming distance. But ( x y fc © v) is 
in Si and ( x y fc © u) ranges through S* Thus an equivalent, but simpler, 
statement is that .r x is correct when any vector in S 0 is transmitted unless' 
for some vector v in S lt 

w[ x n © v] < w[ x n + u]; all u in S 0 . (6.100) 

^ On the other hand, when the transmitted signal x y fc is in S lt the decision 
is correct unless, for some vector u in S 0 , 

w Ki n © iy*) © w] < w[( x n © x y k ) © v] ; all v in S x . 

But now Gy* © u) is in S u whereas ( x y* © v) ranges through 6' 0 . Thus 
Eq. 6.100 again describes the condition for error. The probability that 
x n causes Eq. 6.100 to be satisfied, hence £ x to be in error, is independent of 
k, which proves III. 

2c<f Proof of IV. The derivations of the three preceding subsections 
establish the bound 

p [£] < L P[£, | x 0 ] (6.101) 

on the over-all error probability for any particular convolutional encoder 
and our suboptimum decoder. The remaining task is to determine the 
attainable exponential behavior of PISj | x 0 ]. 

The number of codeword segments { x yJ entering into the decision x t 
is 2 c , which is still enormous when K is large. As usual, we evade the 
problem of calculating the error probability for any particular convo- 
lutional code by resorting to a random-coding argument. We consider an 
ensemble of communication systems, each of which uses, a different 
convolutional code, and calculate a bound on the mean value of P[8j x ] 
over the ensemble. Most systems in the ensemble must afford a P[8 X x”j 
not substantially larger than the mean. 

We have already noted that a convolutional coder is specified by its 
generator sequence g. In calculating P[8 X j x 0 ], it is convenient to consider 
the ensemble in which g is equally likely to be any one of the 2 N possible 
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binary sequences of length N = Kv. In other words, over the ensemble of 
communication systems g is EL. 

The first step in bounding | x 0 ] is to show that over the ensemble of 
generator sequences any codeword in subset S x is also EL. Proof rests on 
the observation that when the generator — defined in Eq. 6.86 as 

g = (gi> g 2 > • - • > gff) 

—is EL, the y-component connection vectors { gj } (defined in Eq. 6.80) are 
necessarily EL and statistically independent. But any codeword in S x , say 
v, corresponds to a coder input vector for which = 1. In accordance 
with Eq. 6.90a and Fig. 6.36, v may therefore be written in the form 

v = 1 • f/ © ZzU' © ' ' ' © X K^K 
= (g/. ■ • ■ . Sk). (6.102) 

in which 

gi' = 8i 

g 2 ' = g 2 © (*8gl) 

&S = g3 © (*282 © X 3%l) 

g K = g K © (*sgir-i © • • • © 

Each g { ' is the modulo-2 sum of the EL vector gj and another vector (in 
parentheses) of which it is statistically independent. Accordingly, the 
{ g[ '} also are both EL and statistically independent, which implies that 
any codeword v in S x is EL. 

When x 0 is the message, so that jy — iy 0) a correct decision x 1 is made 
by each system in the ensemble for which at least one codeword in Sq is a 
more probable cause of t r than is any codeword in In particular, an 
error does not occur in systems such that 

vv[,r © v] > w[ x r © x y 0 ]; all v in S v (6.103) 

We overbound P[8 X | x 0 ] for each system in the ensemble, hence over- 
bound P[£ x | x 0 j, by neglecting the fact that x l may still be decoded cor- 
rectly (because of codewords in S 0 other than iy 0 ) even if Eq. 6.103 is not 
satisfied. Thus P[£ x | x 0 ] is bounded by the probability that at least one 
of the 2 K ~ l EL vectors in S x is a more probable cause of pt than is the 
transmitted segment -Jo- 
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The remaining step is to recognize that this last probability is closely 
related to our mean probability of error bound on P[£ | ra 0 ] with a BSC 
and a Ah-digit block parity-check code having 2 K equally likely messages. 
The only mathematical distinction is that with a convolutional code there 
are only 2 K ~ 1 EL vectors that can cause an error, whereas in the block- 
code case there are 2 K — 1. Without further ado, we have 


P[Si | x 0 ] < 2*- 1 p 2 [g] < 2 K 2~ yIio 



(6.104a) 

in which 


N = Kv 

(6.104b) 

and, from Eq. 6.70, 


= \ — log 2 [1 + 2\! p(l - p)\. 

(6.104c) 


This completes the proof IV. 


6.4 SEQUENTIAL DECODING 

Although in principle both block and convolutional codes afford a 
P[S] that decreases exponentially with K, we have not yet addressed the 
crucial problem of actually building decoders that achieve such error 
performance. Specifically, the suboptimum decoder considered thus far 
is not realizable for large K because its procedure for decoding each 
successive input bit x h , h — 1,2, ... , L, involves comparison of the 
received message span A r with 2 K A- branch codeword segments. The 
adoption of a “sequential” procedure for determining each x h evades this 
exponential blow-up and permits us to specify a decoder that achieves an 
exponentially small error probability while remaining realizable even 
when K is large. 

In this section we introduce sequential decoding by a heuristic discussion 
of its application to the binary symmetric channel. We then detail a 
specific decoding algorithm due to Fano. 28 The algorithm is extended to 
more general channels and analyzed mathematically in Appendix .6 A. 
Engineering applications are discussed in Section 6.5. 

In its simplest form a sequential decoder proceeds in much the same 
way as our suboptimum decoder. Both decide on each successive message 
input bit in turn, one after the other, as indicated in Fig. 6.37. For both 
the problem of decoding x h is equivalent to the problem of decoding x t , 
provided that the Mh starting node is correct. The two decoders differ 
distinctly, however, in how the decisions are determined. 
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Tree Searching 

We have already remarked that the convolutional coder input x may 
be regarded as a set of instructions that direct the transmitter along some 
path through the code .tree. Let x y represent the first N = Kv digits 
encountered along that path and let x n denote the first N noise digits. If 
we assume initially that the BSC is noiseless, so that x n = 0, then x r — 
iy © x n = x y. In this trivial case a decoder provided with a replica of 
the encoder can easily trace out the first K branches of the path designated 
by x. The decoder starts at the first node of the code tree, generates both 
branches diverging therefrom, and follows the one that agrees with the 
first v digits of x r. Having thus been directed to a particular second-level 
node of the code tree, the decoder again generates both branches diverging 
therefrom and follows whichever branch agrees with the second v digits 
of x r to a third-level node. Continuing in this way, the decoder rapidly 
determines the first K digits of x. The procedure works without difficulty 
as long as the two branches diverging from any node of the code tree 
differ by at least one digit. It is clear from Fig. 6.32 that such a difference 
may be guaranteed by connecting the first ^-register stage to the first 
modulo-2 adder, that is, by making g u — 1. 

When the BSC is noisy, x n is not in general 0 and the procedure just 
described is not sufficient even to decode the first message digit, ar x . But 
a simple modification is appealing and may be used to decode x t with 
high reliability. If neither branch stemming from an intermediate node 
coincides with the corresponding v digits of x r, the decoder first follows 
whichever branch agrees best. Clearly, when more than vj2 transitions 
occur in the transmission of a branch, such a decoder initially proceeds 
to an incorrect node. Having once made this mistake, however, in 
subsequent branch comparisons the decoder is unlikely to find any path 
stemming from this incorrect node which agrees well with the remaining 
digits of ]jr. For example, with the truncated K = 4, v — 5 code tree of 
Fig. 6.39, assume 

x = (1,1, 0,1) (6.105a) 

x n = (10010,00111,00000,00100), (6.105b) 

so that the transmitted vector is 

,y = (Hill, 10101,01101, 11011) (6.106) 

and 

x r = (01101, 10010,01101, 11111). (6.107) 

In this case, as shown in the figure, the decoder follows the correct path 
to node (a), thence the incorrect path to node ( b ). But none of the paths 
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extending beyond node ( b ) agrees with x r in nearly so many coordinates 
as does the correct path x y. When v is properly chosen with regard to the 
BSC transition probability p, the effect of a wrong turn is likely to be 
readily noticeable as the decoder attempts to penetrate deeper into the 
code tree. 

The idea of sequential decoding is to program the decoder to act much 
as a driver who occasionally makes a wrong choice at a fork in the road, 
but quickly discovers his error, goes back, and tries the other. The 
decoder’s objective is to construct a path K branches long extending all 
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the way through the truncated code in Fig. 6.39 to one of its 2 K terminal 
nodes. As soon as such a path is found z r i s determined in accord with 
the first branch of that path. The observed TV-digit span of the received 
sequence is then shifted v digits to the right, as indicated in Fig. 6.37, and 
the entire decoding procedure is reiterated to determine x 2 . An error in 
decoding x x results if and only if a wrong turn at the first node is not 
recognized before the decoder penetrates K branches beyond it. 



We now describe how a sequential decoder recognizes wrong turns in 
decoding x x . Let us assume that the decoder has penetrated l branches 
into the code tree, 0 < / < K. Let d(l) denote the total number of dif- 
ferences (the Hamming distance) observed by the decoder between the 
(tentative) path it is following, say y *(/), and the corresponding /-branch 
segment of the received sequence, say r(/): 

d(l) = w[y*(l) © r(/)]- (6-108) 

As the sequential decoder penetrates branch by branch deeper into the code 
tree along the tentative path, it maintains a running count of d(l). After each 
successive penetration the decoder compares d(l) against a discard criterion 
function, k(l). If d(l) ever exceeds- k(J), the tentative path is discarded as 
too improbable. The decoder then backs up to the nearest unexplored 
branch for which d(l) < k{l) and again starts moving forward as far as 
the discard criterion function k(l) permits. The decoder keeps track of 
the branches it has explored and thereby avoids needless retracing of any 
branch. 

From the point of view of decoder implementation, a convenient 
discard criterion k(l ) is a straight line, as shown in Fig. 6.40. The law of 


large numbers states that the fraction of digit transitions introduced by 
the BSC will approximate the channel transition probability p when / is 
large. When y*(/) is correct, we therefore anticipate that d(l) will oscillate 
around a straight line of slope pv. On the other hand, when y*(/) departs 
from the starting node (/ = 0) along the incorrect branch, we anticipate that 
d(f) will oscillate aboutf a line of slope \v. We choose k(l) to be a straight 
line of intermediate slope p'v, p <p' < £. Since it is not unlikely that a 
burst of noise will cause many of the initial digits of x r to be in error, k(l) 
is taken to have a nonzero intercept at / = 0. 

Basic Concepts 

The use of suitable discard criteria in sequential decoding makes it 
possible for the decoder to recognize quickly that it is following an 



Figure 6.41 Typical plots of d(l ) along the correct and two incorrect paths. 


incorrect path. For example, in Fig. 6.41 we show two incorrect paths, 
the first of which diverges from the correct path at the starting node and 
the second at an intermediate node. Both cross k(l ) soon thereafter. 
The advantage from a computational point of view — and this constitutes 
the first basic concept of sequential decoding — is that discarding a path 
after l branches also effects the discard of the 2 E ~ l other paths in the 
truncated code tree that diverge therefrbm (see Fig. 6.39). The crucial 
attribute of convolutional codes is that if wrong turns can be discovered 

t The reason is that in good codes the Hamming distance between correct and typical 
incorrect paths approximates one half the codeword length. 
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quickly enough the saving in. number of computations (measured in 
terms of the number of branches explored) is exponential. 

It is clear that the average number of computations is reduced by 
making k(l) more stringent, that is, by choosing both its slope and the 
intercept k( 0) to be small. On the other hand, if the discard criterion is 
too stringent, channel noise may cause every sequence in the code tree 
(including the correct one) to cross k(l) at values of / less than K. In such 
a case the decoder that we have described is unable to construct a path to 
any terminal node of Fig. 6.39, hence is unable to decode x v 



Figure 6.42 A set of equally spaced discard criteria, each with slope p'v. 


Fortunately, there is a way out of this dilemma; this way out constitutes 
the second basic concept of sequential decoding. Let us start with a 
stringent criterion, such as the function k x (l) shown in Fig. 6.42. Most 
often the correct path (or at least some sequence whose first branch is 
correct) will be retained and x 1 decoded with only a small amount of 
computation. On those less frequent occasions when all code tree sequences 
are discarded, a less stringent criterion, such as the function kfj) in the 
figure, is invoked. If, as might happen still less frequently, all sequences 
are discarded with kfl), criterion k z (l) is invoked, and so forth. 

By successively relaxing the discard criteria some iif-branch path in 
the code tree will eventually be retained; with high probability the first 
branch of this retained path will be correct. Of course, the looser criteria 
require more computation than kfl), but since the looser criteria are 
used less frequently the increased computational load that they imply 
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does not necessarily have a disastrous affect on the average computational 
requirement, again measured in terms of the number of branches explored. 

These two basic concepts— early discard of unlikely paths and ap- 
plication of a sequence of criteria — underlie all sequential decoding 
procedures. A decoding algorithm operating essentially as described 
above, but modified to curtail vastly the number of computations by 
exploiting dependencies between successive decisions £ lt has been 

proposed 80 and tested. 65 - 54 Although the modifications are not susceptible 
to mathematical analysis, the experimental results (discussed in Section 
6.5) demonstrate that the resulting algorithm is effective. A more sophis- 
ticated algorithm, which incorporates the essence of these modifications 
and extends them in an intuitively satisfying way, has been devised by 
Fano. 28 The Fano algorithm not only affords more flexible and efficient 
implementation, but also permits extensive analysis (see Appendix 6A). 
We consider this algorithm in detail. 

The Fano Algorithm 

In this discussion of the Fano algorithm we continue to restrict attention 
to the BSC. An extension to more general channels is provided in 
Appendix 6A. Explication of the algorithm is simplified by adoption of 
the “tilted” distance function 

t(l) = d(l) - p'vl (6.109) 

in lieu of the Hamming distance function, d(l), of Eq. 6.108. The cor- 
responding discard criteria, hereafter called thresholds , become horizontal 
lines with spacing A, as shown in Fig. 6.43a. When y *(/) is the correct 
path, t(l) usually approximates the negative quantity (p ~ p')vl and tends 
to decrease as / increases. When y*(/) is incorrect, /(/) behaves typically as 
(i — p')vl and tends to increase as / increases. 

Before detailing the decoding algorithm, it is helpful to introduce 
additional terminology. Given the received vector r', Eqs. 6.108 and 
6.109 specify a tilted distance /(/) for each of the V l - branch paths in the 
code tree, / — 1, 2, ... . A node of the tree at depth / is assigned a t-va/ue 
equal to the /(/ ) of the path leading to that node; the node at the origin of 
the tree is assigned /-value zero. The set of /-values implies a mapping of 
the code tree into a received distance tree , as indicated in Fig. 6.436: the 
nodes are connected together as in the code tree, but the ordinate of each 
node is taken to be its /-value. 

A node of the received distance tree is said to satisfy all thresholds that 
lie on or above it and to violate all thresholds that lie beneath it. The 
tightest threshold satisfied by a node is the one that lies just on or above it. 
Of the nodes diverging from any given node, the one with smallest /-value 
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t(l) 



Figure 6.43a Thresholds (discard criteria) for use with the Fano algorithm, and 
typical behavior of /(/) for correct and incorrect paths. 


t(D 



Figure 6.436 Received distancetree. The node labeled 4 *rtrs/?es thresholds 2A, 3A, . . . , 

and violates thresholds A, 0, -A, The tightest threshold satisfied by node 4 is 2A. 

Theses? node diverging from4 is labeled 5. The worst node diverging from 4 is labeled 6. 
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is called the best and the one with largest /-value is called the worst. The 
definitions are exemplified in the figure. 

Sequential decoders consider one node of the received distance tree at a 
time. We may visualize that this node is designated by a (movable) 
search node pointer: the node being considered in Fig. 6.436 is the one 
labeled 4. In addition, the Fano decoder maintains a running threshold , 
denoted T and equal to &A, where it is a (variable) integer. We say that 
the running threshold is tightened when k is assigned so that T is the 
tightest threshold satisfied by the search node, that is, by the node then 
being considered. 

Given the received vector r', the Fano decoder searches for the correct 
path by moving its search node pointer through the received distance 
tree. The pointer can move forward or backward, but only to an adjacent 
node — that is, only to a node connected to the existing search node by a 
single branch. The pointer movement is controlled by the flow diagram 
of Fig. 6.44. An essential feature of the algorithm is that the pointer is 
never moved either forward or backward unless this can be accomplished 
without violating the running threshold; the running threshold is raised 
only when necessary to accommodate such a move. 

The operation of the decoder is best explained by example. Consider 
Fig. 6.45. The decoder starts its tree search at the initial node, labeled 0. 
The initial value of the running threshold T is zero. In accordance with 
Fig. 6.44 the decoder looks forward to the node labeled 1. Since the t- 
value of this node does not violate T, the search node pointer is then 
moved forward to node 1. By this movement the decoder makes a ten- 
tative decision, x x ; the decision is 0 when the node labeled 1 corresponds 
to x x = 0 and 1 when this node corresponds to x x = I. 

With the search node pointer on node 1, the running threshold T = 0 
is as tight as possible. The decoder therefore next looks ahead to node 2; 
it moves the pointer to node 2 after noting that the running threshold is 
not violated, thereby making a tentative decision x 2 . At node 2 the de- 
coder is able to tighten the running threshold and sets T = — A. This 
procedure of looking, moving the pointer, and tightening T continues 
until in looking forward from node 4 to node 5 the decoder observes a 
violation of the running threshold T = —2A. The decoder reacts by 
looking back to node 3. Since the running threshold T = — 2 A is not 
violated by node 3, the pointer is moved back. The effect is to erase the 
tentative decision x 4 . On the step forward to node 6 the complementary 
choice of x 4 is made. The remainder of Fig. 6.45 is self-explanatory. 
The search path y*(/) is specified at any instant by the tentative decisions 
x lt x 2> . . . , x p which together determine the position of the search node 
pointer. 
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Enough thought will make it clear that the flow diagram of Fig. 6.44 
will direct a successful search through any tree and eventually trace out 
the correct path so long as t{l) for the correct path ultimately decreases, 
whereas *(/) for every incorrect path ultimately increases. In particular, 



Figure 6.44 Basic flow diagram for the Fano algorithm. In Appendix 6A we consider 
general convolutional codes with « nodes diverging from each node of the code tree, 
/, > 2. (See Fig. 6A.1 .) The flow diagram above is so worded that it pertains to these 
general codes for which the entire corresponding set of u nodes in the received distance 
tree is considered to be ordered from best to worst, according to increasing /-value. 
Two or more such nodes with equal /-values may be ordered relative to each other in 
any specific way. 

the algorithm cannot become trapped in an endless loop, continually 
searching the same nodes with the same thresholds. 

It is helpful to note that in searching for a path on which /(/) ultimately 
decreases as / increases the decoder examines all accessible nodes lying 
beneath a given threshold before increasing the running threshold. After 
an increase no further change in T is permitted until either (i) all accessible 
paths are found to violate the new running threshold, necessitating another 
increase of A in the value assigned to T or (ii) the search node pointer 
arrives at a node that it has never reached before. Further properties of 



Figure 6.45 Example of tree search with algorithm of Fig. 6.44. For the search 
detailed below /-values are calculated only for those nodes of the tree actually shown in 
the figure. 
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the algorithm are stated in Appendix 6A. The analysis performed there 
permits a sensible choice to be made for the design parameters A and p . 

The basic flow diagram of Fig. 6.44 requires one elaboration to permit 
efficient implementation. The box labeled “Is pointer at node for first 
time ?” can, of course, be realized by providing a sufficiently large memory. 
But the number of nodes examined by the decoder is in general exceedingly 



Figure 6.46 Augmented flow diagram for the Fano algorithm. 


large, and such a memory is prohibitively costly. An ingenious alternative 
proposed by Fano uses only a single binary variable, which we designate 0, 
to determine when to tighten the running threshold. This additional 
variable and the accompanying logic are included in the augmented flow 
diagram of Fig. 6.46. The variable 0, initially 0, is set equal to 1 immedi- 
ately following observation of a running threshold violation on a forward 
look. As long as 0 remains equal to 1 the algorithm prevents tightening 
of T. As soon as the search node pointer moves to a new node (one never 
reached before) 0 is reset to 0 and tightening is again permitted. 


A new node can be encountered only on a forward move and is easily rec- 
ognized. First, a node is new if it violates the threshold, say, T 0 = T — A, 
just beneath the running threshold T: for example, when node 11 of 
Fig. 6.45 is reached, it violates the previous value T 0 = —3 A. Second, a 
node that satisfies T 0 is new if reached by a forward move from a node that 
violates T 0 : for example, node 10 of Fig. 6.45 satisfies T 0 = — 3A but 



Figure 6.47 Example of tree search with algorithm of Fig. 6.46. 
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is reached from node 7, which violates this running threshold value. 
(Running threshold T was increased from — 3 A to — 2 A before the step 
back from node 8 to node 7.) The search node pointer can arrive at a new 
node only in one of these two ways; otherwise, the node would have 
been accessible with running threshold value T 0 , hence examined pre- 
viously. The algorithm of Fig. 6.46 recognizes both possibilities and reacts 
by setting 0 = 0. 

As with Fig. 6.44, an understanding of the flow diagram of Fig. 6.46 is 
obtained most readily by example. The search detailed in Fig. 6.47 is 
self-explanatory. 

A block diagram of a Fano decoder is shown in Fig. 6.48. Received 
digits are read in parallel into the r-register of the decoder one branch 
( v digits) at a time. In practice, branches are received at uniform time 
intervals. As each new branch is received, the contents of the r-register 
are shifted toward the right. The oldest branch is shifted out and lost 
whenever a new branch is entered. 

The decoder contains a replica of the convolutional encoder at the 
transmitter. The path hypothesis, y*(/), is generated in this replica branch 
by branch, matched with the corresponding received branch, and /(/) is 
updated. The penetration index /, represented by the depth of search 
pointer in Fig. 6.48, is increased (the pointer moved left) or decreased 
(the pointer moved right) in accordance with the search algorithm. In 
addition, the pointer and ^-register shift one step to the right each time a 
new branch is received. 

The input bits hypothesized by the encoder replica in generating y*(/) 
are written in the ^-register. Thus the ^-register positions to the right of 
the pointer are full and those to the left are empty. The decoded output 
vector x is the sequence of digits shifted out of the rightmost stage of the 
.-2-register. Note that the ^-register is equivalent to the search node pointer 
considered in connection with the flow diagram of Fig. 6.44. In contrast, 
the depth of search pointer of Fig. 6.48 indicates where the received branch 
being observed by the decoder is located within the r-register. 

The ambulations of the pointer in Fig. 6.48 depend on the received 
data rate, the computational speed of the decoder, and the details of the 
received noise. When most of the received digits are correct, very little 
searching is necessary to extend y *(/). In this case the pointer usually 
hovers near the input end of the decoder and waits for new data. On the 
other hand, if the number of erroneous received digits is too large, a vast 
amount of searching is involved and the pointer is dragged to the right 
as new branches are fed into the decoder. The decoder is in trouble if the 
depth of search pointer is forced to the output end of the decoder. The 
implications of this event are discussed in Section 6.5. 


f*=- T-branches 



Figure 6.48 Block diagram of Fano decoder for the case v = 3. 


6.5 SUMMARY OF RESULTS 

In this section we summarize certain theoretical analyses of sequential 
decoding and discuss the major experimental results that have been 
obtained. 

Analytical Results 

Precise analysis of an actual Fano decoder is complicated by the fact 
that the size, T, of the /--register in Fig. 6.48 is finite. Meaningful insight 
into decoder performance, however, is gained by assuming that T is so 
large that the search position pointer is never forced to the output end of 
the r-register. We use this assumption in Appendix 6A to bound a quan- 
tity analogous to the mean probability P[S, ( ] considered in connection 
with the suboptimum decoder of Section 6.3. The overhead bar, as usual, 
signifies expectation with respect to a suitable ensemble of codes. In 
particular, for coders like that of Fig. 6.32 operating over a binary sym- 
metric channel, Eq. 6A.18 implies! that 

Pf£J < T 0 2 -S*k*o7* n )-i]. * n < (6 . 110) 

in which the value of the coefficient A 0 is given by 
, _ 2 

0 1 _ 2‘[«o 7R n )-U (6.1 1 1) 

and R 0 ' is the exponential bound parameter evaluated for the BSC (see 

(see Eq. 6.70). As discussed in the appendix, Z,P[S 7i ] may be interpreted 

t Equations 6.1 10 and 6.11 1 follow from Eqs. 6A.18 and 6A.17b by specializing to the 
parameter values // = 2, v = 1 /J? N , A = 2. 
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as indicative of the probability of one or more errors in decoding a long 
sequence of L message symbols with the Fano algorithm when the r- 

register is infinite. t . ... 

Under these same conditions, specialization of Eq. 6A.27 implies that a 
quantity B — interpreted in the appendix as indicative of the mean number 
of branches searched by the algorithm per message symbol decoded-is 
bounded by 

B < 3<4 0 2 ; Ru < ( 6 - 112a > 


Thus the computation bound is independent of K, but varies as 


3A 2 


100 

[W/*n) - U 1 


(6.112b) 


for R n only slightly less than R 0 '. 

Although the particular bounds presented here have been chosen lor 
ease of derivation and simplicity of form rather than for tightness, they 
place in evidence three important characteristics of sequential decoding: 

1 . For < Ro the code constraint length K can be increased without 
increasing the bound on B . 

2. For i* N < Ro', Z>[S3 decreases exponentially with an exponent that 

is linear in K. 

3. Although B and P[fi A ] are well behaved with regard to K, both 
bounds blow up as R H approaches R 0 '. 

Thus by increasing K it is possible to obtain as small an error probability 
as desired without incurring a concomitant increase in the mean com- 
putational speed demanded of the decoder; however, the channel imposes 
an upper limit, R 0 ', on the maximum rate at which this kind of performance 

These three characteristics are believed to be fundamental attributes 
of all sequential decoding procedures; they are reflected in all bounds 
obtained and in all experiments reported thus far. 48 - 78 ’ 91 



System Evaluation 

Shannon’s original and revolutionary proof that channel disturbances 
fundamentally limit the rate, but not the accuracy, of communication 
was first published in 1948. Since then a great deal of effort has been 
devoted to the problem of actually achieving improved communication 
reliability, and many interesting coding a.nd decoding schemes have been 
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devised. Most of them are well documented, 31 - 57 ' 66 - 96 and we shall not 
discuss them here. 

The relative desirability of different solutions to specific communication 
problems depends critically on the engineering objective. For example, 
consider a bandlimited Gaussian channel and Shannon’s bounding 
reliability exponent R 0 *, plotted in Fig. 5.18. If moderate accuracy at 
low data rates (,R N /i? 0 * << 1) suffices, the problem can be resolved by 
appropriate choice of a modulator and demodulator; coding is not 
needed. If the objective is to obtain high accuracy at low data rates, 
easily implemented schemes such as threshold decoding 57 are indicated. 
The most difficult problem arises when we simultaneously require high 
accuracy and high data rate, RJR 0 * < 1. In this case powerful codes 
( K » 1) and complex terminal equipment are unavoidable, and the 
comparison of different coding techniques becomes especially intricate. 

A particularly interesting class of codes affording large K is the Bose- 
Chaudhuri-Hocquenghem 15 — hereafter abbreviated BCH— codes. For 
any integer m there is a binary BCH code with word length N = (2 ra — 1) 
which contains 2 K codewords and is guaranteed to correct any com- 
bination of t or fewer BSC channel transitions, with > N — mt. De- 
coding schemes that are applicable to these codes whenever the number 
of transitions is less than or equal to t have been discovered by Peterson 66 
and also by othersf; these schemes require a number of computations 
which grows as a small power of t. 

As an example of one way in which different decoding schemes can be 
compared, we now consider the performances achievable over an additive 
white Gaussian noise channel with binary BCH codes and with binary 
convolutional codes and sequential decoding. Antipodal signaling and 
symmetric two-level receiver quantization are assumed. The resulting 
BSC transition probability is 

. r-iJW)’ (<U13) 

in which EJN 0 is the energy-to-noise ratio per message input bit and 
= KfN is the data rate* in bits per transmitted symbol. 

With the aid of a digital computer, the minimum value of EJX 0 
required for any particular BCH code in order to achieve a stated error 
probability per bit, P x , can be determined. We define P x = (1 /K) P[SJ, in 

f See, for example, G. D. Forney, Jr., “Concatenated Codes,” Sc.D. Thesis, M.I.T., 
June 1965. See also D. Gorenstein, and N. Zierler, “A Class of Error-Correcting Codes 
in p m Symbols,” J. SIAM 9, 207-214, June 1961. 
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Figure 6.49 Minimum value of EJJT 0 , as a function of i? N > for BCH codes of length 
N = 15, 31, 63, 128, 255. Only the dots represent data points. For small Ru the 
solid (sequential decoding) curve can be lowered 2 db through use of close-grained, 
rather than binary, detector quantization (see Eq. 6.114b and Fig. 6.21). 
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which P[£] is the BCH block error probability. The resultsf are plotted 
as a function of i? N for P x = 10 -s and 10“ 8 in Figs. 6.49a and b. 

For purposes of comparison with sequential decoding we can refer to 
Eq. 6.70 and determine the value of EJN 0 required to obtain R N /R 0 ' = 
0.9. The results are also plotted in Figs. 6.49a and b. Equations 6.110 
and 6.111 state that with these values of an arbitrarily small error 

probability can be obtained by choosing K large enough. We shall soon 
see that when decoding sequentially it is sensible to choose K so large that 
the error probability will be truly negligible and to restrict R N < 0.9J? o '. 

From a system-engineering point of view curves such as those in Fig. 
6.49, although instructive, are by no means a sufficient basis on which to 
decide among contrasting design approaches. We must also take into 
account the fact that different decoding schemes have different operational 
and implementational advantages and disadvantages. For instance, so far 
we have considered only the average computational demands with se- 
quential decoding; in the next subsection we consider also the variability 
of the number of computations. Although BCH codes in general require a 
larger average amount of decoding computation than sequential decoding, 
•the computational demand in the BCH case is much less variable, which 
is a distinct advantage in many applications. 

The greatest asset of sequential decoding is the scope of its applicability. 
We have already mentioned that sequential decoding procedures can be 
applied to a broad class of communication channels. Specifically, the 
class includes, but is not restricted to, every constant memoryless discrete 
channel — that is, every channel for which the statistical connection 
between input and output symbols on each use of the channel can be 
modeled adequately by a fixed transition diagram such as that shown in 
Fig. 6. 17. For any such channel bounds equivalent to Eqs. 6. 1 10-6. 1 12 — • 
but with Rq given by Eq. 6.62b — are derived in Appendix 6A. Thus great 
flexibility may be exercised in the design of a modulation and demodula- 
tion system to be used in conjunction with sequential decoding. 

An immediate implication is that if very close-grained, rather than 
binary, quantization is used at the matched filter output of an additive 
white Gaussian noise channel, the limiting value of R M with binary 
convolutional coding can be increased from 

Rq = 1 — log 2 [1 + 2Vp(l - />)] (6.114a) 

f Since no decoding algorithm for BCH codes has been devised for a number of channel 
transitions greater than t, a block decoding error is presumed whenever this event 
occurs. In determining the performance of the BCH codes, the values of t and K were 
taken from Table 9.1 in Peterson 66 and provide stronger results than guaranteed by the 
bound K > N — mt. 
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to a value arbitrarily close to the unquantized error exponent 

R 0 =l-log 2 (l + e -®N' A "«); E n =E„R n , (6.114b) 

with consequent improvement in the efficiency of energy utilization. Also, 
we need not restrict consideration to binary signaling, but can design a 
multiamplitude transmitter modulator to yield an R 0 ' that is nearly 
optimum for any value of EJ N 0 , as discussed in Chapter 5. A relevant 
experiment is discussed at the end of this chapter. Calculations of R 0 ' 
for certain propagation channels characterized by random phase shift 
and fading are considered in Chapter 7. 

Simulation Results 

Analytical difficulties with sequential decoding are such that even the 
tightest bounds that have been derived are not numerically accurate 


<B> 



Figure 6.50 Empirical average number of computations per bit decoded. (Figures 
6.50-6.52 and 6.56 have been made available through the courtesy of G. Bluestein and 
K. L. Jordan. 13 ) 

enough for purposes of engineering design. In this section and the next we 
summarize some of the results that have been obtained by computer 
simulation of the Fano algorithm at the MIT Lincoln Laboratory. 13 

The actions performed by a decoder in communicating a long sequence 
x of L convolutional coder input bits over a BSC has been simulated on 
a digital computer programed to count the number of code-tree branches 
actually searched during the entire decoding procedure. We define ( B ) 
as this total divided by L. Thus ( B ) is the empirical average number of 
branches searched per message input bit decoded. 

In Fig. 6.50 we plot the observed values of ( B ) as a function of the ratio 
RJR 0 '. The code constraint length K used in the simulation was equal 
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to 60, and was large enough so that no decoding errors occurred. For 
this value of K the suboptimum decoder considered in Section 6.3 would 
make approximately 2 60 10 18 branch comparisons per decoded digit, 

and for R N = 0.9R 0 ' even the bound of Eq. 6.112 claims only that B < 
8100. In contrast, for the experiments summarized in Fig. 6.50 we 
observe that (B) < 4 for all R N /R 0 ' < 0.9. As R N ->- R 0 ', the value of 


<B> 



Figure 6.51 Average number of computations in decoding as a function of (a) tilt; 
(6) threshold spacing. For both curves, Rm/Ro' = 0.89. 

<£) rapidly becomes large, a behavior that is in accord with the bound of 
Eq. 6.112. Experimentally, (B) is found to depend strongly on the ratio 
of i? N and R 0 ' but only weakly on the value of these parameters individually. 

The data of Fig. 6.50 were taken with the tilt parameter p' (see Eq. 
6. 109) and threshold spacing A optimized empirically to minimize (B). The 
behavior of ( B ) as a function of tilt and spacing in a typical case is 
shown in Figs. 6.51a, b . We observe- that precise minimization with regard 
to these parameters is not necessary. Finally, it should be remarked 
that the first several digits of the generators {g} used in the experiments 
reported here were carefully chosen to yield good performance 00 ; only 
the tails of the generators were chosen at random. 
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Dynamical Decoding Behavior 

Although Fig. 6.50 shows that the average computational demand (B) 
with sequential decoding is quite small for RjR 0 ' < 0.9, the actual 
number of branches, say B, searched by the computer in the dynamical 
process of penetrating from one node to the next in the code tree is extremely 
variable. We denote the relative frequency with which B exceeds any 
number y by F[B > y]. Typical plots of F [B > y], with RjR 0 ' as a 
parameter, are shown in Fig. 6.52. For large values of y the relative 



Figure 6.52 Empirical distribution function of number of computations in decoding. 

frequency varies approximately as an inverse fractional powerf of y. 
For these data an empirical relationship, valid for y » 1 and i? N < i? 0 > 
is 13 

F[B > y] « 3 -<i-«N/Bo') 7 -(2.9- 2 « N /i?o ). (6.115) 

For an analytical expression, see Problem 6.17. 

The variability of the number of computations has a profound influence 
on the design and evaluation of a sequential decoding system. For 
example, with a BSC and an appropriate convolutional code we anticipate 
that except for short excursions the pattern oi channel transitions will 
usually leave the received sequence much closer in Hamming distance to 
the sequence actually transmitted than to any of the other possible 
encoder outputs. Then a Fano decoder typically searches out the correct 

t A probability distribution with this type of behavior is called “Par£to.” If a Pareto 
random variable is to have finite mean, the exponent of y must be no greater than 
minus one. This condition would be met in Eq. 6.115 for = Bo if the constant 
2.9 were replaced by 3.0. The empirical value 2.9 reflects a small amount of experimental 
performance degradation. 
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path through the code tree with great rapidity, which accounts for the 
small value of (B). fn such circumstances the pointer designating the 
depth of search of the decoder in Fig. 6.48 hovers near the input end of 
the decoder and all is well. 

Infrequently, however, the channel transition pattern wiii cause the 
tilted distance along the correct path to increase with / over a span of 


KD 



Figure 6.53 A plot of /(/) along the correct path which involves a great amount of 
computation in decoding. Before reaching node 15 the decoder must examine each 
path diverging from nodes 4 through 14 until the path crosses threshold —3 A. 

considerable length. ' In this atypical event the decoder must search far 
back into the code tree, examining an enormous number of branches 
before it can follow the correct path over the local maximum, as illustrated 
in Fig. 6.53. The probability of a deep search decreases, but the resulting 
number of computations increases, rapidly with search depth. The balance 
between the two effects accounts! f° r the fact that F[B > y] is only a 
slowly decreasing function of y. In such circumstances the on-pouring 
stream of received digits may force the depth of search pointer to the output 
end of the decoder memory, a condition called overflow. 

t The distribution of computation is analyzed in J. E. Savage, “The Computation 
Problem with Sequential Decoding,” Ph.D. Thesis, February 1965. Additional 

empirical data is to be found in K. L. Jordan, “The Performance of Sequential Decoding 
in Conjunction with Efficient Modulation,” IEEE Trans. Comm. Tech., COM -14? 
283-297, June 1966 
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With surpassing infrequency, the channel transition pattern may create 
a received sequence that closely approximates one of the possible, but 
incorrect, coder outputs. As an extreme example, if the channel noise 
sequence is 

n = f fc , (6.116) 

where f h is the Ath translate of the code generator sequence g, it follows 
from Eq. 6.83 that the received sequence is exactly that code word which 
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Figure 6.54 An abstract representation of the events “overflow” and “error” for the 
BSC. The dots represent the encoder outputs {y,}. Received signals lying in the open 
region are not decoded because of overflow. 

would have been transmitted had the Ath digit of x in fact been different. 
In this case x h will with certainty be decoded both easily and incorrectly. 
Even in less extreme cases it is possible for several incorrect digits to be 
released from the ^-register without a large increase in the number of 
branches searched and concomitant dragging of the search position 
pointer to the right. Such events are called undetectable errors. 

From an operational point of view, we need to distinguish between 
errors and overflows. For conceptual purposes the distinction may be 
envisioned geometrically as shown in Fig. 6.54: when the actual encoder 
input is x 7c , channel transition patterns producing received sequences that 
lie within the crosshatched area correspond to correct operation of the 
decoder. By symmetry, received sequences lying within the shaded areas 
must therefore yield undetectable errors. Intermediate transition patterns, 
producing received sequences lying within the open region, yield overflow. 

In order to be practicable, a sequential decoder must be designed and 
operated so that both the overflow and error probabilities will be very small. 
Obtaining a small overflow probability is the more difficult problem, and 
• we discuss it first. 
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Probability of overflow. The magnitude of the overflow probability 
depends primarily on the size of the decoder memory and the com- 
putational speed of the decoder. Like (B), it is insensitive to the code 
constraint length but very sensitive to R 0 'jR N . 

We gain insight into the probability of overflow with the Fano algorithm 
by considering the decoding of a new input bit oh the assumption that 
the decoder’s depth of search pointer starts out at the extreme left of the 
memory, as in Fig. 6.55. If T is the number of received branches that can 



Figure 6.55 Initial condition of decoder for simplified overflow analysis. 

be stored in the memory, an overflow is then certain to occur unless the 
decoder is able to penetrate at least one branch deeper into the code tree 
before T new branches are received. An equivalent statement is that 
overflow must occur if more than XT branches are searched before 
additional penetration, where l denotes the number of branches the 
decoder can search in the time allotted to the transmission of each branch. 
From Eq. 6.115 the relative frequency, say F, of this event is 

p A > XT] ** 3-<i-^N/R«')[Ar]-[ 2 - 9 - 2(R N/Ro')i_ (6.117) 

In the process of searching the decoder moves back into the code tree, 
so that an overflow may occur before XT branches have been searched. 
As a practical matter, however, whenever X > ( B ) and the right-hand side 
of Eq. 6.117 is very small, F provides a useful estimate of the probability 
of overflow.f The validity of this statement rests on the waiting-line 
behavior observed in the course of the decoder simulation experiments. 
By waiting-line we mean the position of the decoder depth of search 
pointer, under the assumption that T is infinite. 

f If A is less than (B), the decoder will not even be able to keep pace with the average 
computational demands of the received message stream. 
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5 


In Figs. 6.5 6a, b, c we show the types of waiting-line behavior that f 

result when A is held fixed and R^jR 0 ' is varied. For R^jRo = 0-89 and | 

A = 3 { B ) the waiting line is usually zero, although the channel transition 
pattern causes an occasional long search and consequent intermittent 
waiting-line buildup. For i? N /i? 0 ' = 0.96 the long searches are more 
frequent and there is danger that the residuum of one waiting-line buildup 
is not cleaned out before the next one occurs. For R N /Ro = 1 -06 the long 
searches coalesce and the waiting line is unbounded. 

A small overflow probability with reasonable values of F and A is 
possible only when the waiting-line behavior is typified by Fig. 6.56 a. 

In such cases, overflows are primarily attributable to difficulty in decoding 
isolated message input digits, and the probability of overflow inter- 
pretation of Eq. 6.117 is meaningful. With interesting values of data 
rate, a value of F }$> K is then required if F is to be small, and it is F , 
rather than the code constraint length K, that primarily governs the size of 
the decoder. 

As an example, assume for a BSC that 
F = 1(T 6 
Rn = 1 

R ’ = Ij± = 0.392 (p = 0.074) 

0 0.85 

K = 100 

R — 20 kilobits/sec. 


From Eq. 6.117 we require 

10“ fi = 3-u-o-8&>(^r)~ (2 ' 9 ~ 1-7) , 


or 

Ar = 8.7 x 10 4 . 


If it takes 7.5 ,asec for a special-purpose decoding computer to search a 
branch, we have 


time to receive a branch _ 0.05 x 10 3 _ ^ ^ j 

time to search a branch 7.5 x 10 -6 ’ j 


which, from Fig. 6.50, meets the requirement A > <J5). Hence we need 
r = 1.3 x 10 4 » K. 


For this example each of the received branches requires 3 bits for 
storage (u = l/i? N = 3), and the storage of each decoded hypothesis x h 
requires 1 bit. Thus the total bit storage requirement is approximately 
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4r = 5.2 X 10 4 . With magnetic-core memories, this is quite feasible. 
It should be noted, however, that it would be difficult to make the over- 
flow probability very much smaller without sacrificing data rate. For 
example, fixing RJR 0 ' = 0.85 and increasing AT by a factor of 100 
reduces F only from 10-« to 10~ 8 - 4 . The overflow probability is not an 
exponentially decreasing function of the decoder memory size or speed. 

Probability of error. Since the overflow probability is controlled by 
the decoder memory size and speed of computation, it is difficult to make 
it extremely small. This is not true of the probability of undetectable 
error. Consistent with Eq. 6.110, the undetectable error probability for a 
Fano decoder decreases exponentially with the convolutional code con- 
straint length K. By choosing K large enough, we can achieve an ar- 
bitrarily small error probability, provided that < Rq . Furthermore, 
in interesting cases it is possible to attain incredibly minute values of error 
probability — say 10~ 12 — ' with values of K that are orders: of magnitude 
less than the decoder memory size T required to make the overflow 
probability reasonably small — say 10“ B . Thus the incremental cost of 
undetectable error control is not material. 

Insight into the relative insignificance of the undetectable error prob- 
ability is provided by reconsideration of the geometrical representation 
of Fig. 6.54. The shaded regions, which correspond to undetectable 
errors, typically occupy only a small fraction of the total number of 
points in the received signal space. Moreover, the shaded regions are 
typically far apart from one another in Hamming distance and surrounded 
by the open region, which corresponds to overflows. It is therefore 
reasonable that when the overflow probability is small the probability 
of undectable error is minute. 

This conclusion has been verified analytically 28 - 78 . Not only does 
the undetectable error probability decay exponentially with K, but it 
decays with a considerably larger exponent than that given by the random- 
coding bound of Eq. 6.110. The reason that the bound is weak is that it 
neglects the possibility of overflow. The effect is interpreted geometrically 
in Fig. 6.57. 

There is another contributant to the over-all error probability which 
we have not yet discussed. This contributant is the probability that an 
incorrect hypothesis x h is forced out of the decoder just before an overflo w 
occurs. For the Fano decoder structure considered thus far the probability 
of such an event is comparable to the overflow probability. Indeed, the 
code structure of Fig. 6.35 implies that the usual number of computations 
involved in a search l branches back into the code tree grows exponentially 
with l. Thus even moderate values of / typically force the depth of search 
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Figure 6.57 Geometrical interpretation of the weakness of the error probability 
bound. An undetectable error occurs when y& is transmitted if r' lies in one of the 
shaded regions. On the other hand, the bound on P[Ea] estimates the probability that 
r' lies outside the decision region I k that is applicable in the absence of overflow. 

pointer rapidly to the right, so that overflows occur because the decoder 
is uncertain of the identity of the hypothesized message bits at the extreme 
right end of the ^-register. Since errors of this type accompany overflows, 
they are called detectable. 

Decoder release of detectable errors can be controlled by modifying 
the decoder in a simple way: we need only extend the ^-register several 
code constraint lengths — say 3 K digits — beyond the end of the decoder 
r-register, as shown in Fig. 6.58. The effectiveness of the procedure rests 
on the fact that, if the decoder never needs to search back to an A h that 
passes beyond the end of the r-register, x h must either be correct or 


T-branches 


Depth of search pointer • 


r-register 


^-register 
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Figure 6.58 Modification of Fano decoder to incorporate error detection. 
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correspond to an undetectable error. On the other hand, if the decoder 
does need to search beyond the confines of the /-register, an overflow 
will occur. When the point at which hypothesized bits are actually 
released from the decoder is sufficiently far removed from the overflow 
position, the probability that overflow will result before enough time has 
elapsed for an incorrect hypothesis to be released becomes comparable 
to the probability of an undetectable error. If we stop decoding at the 
instant of overflow, extension of the ^-register yields an over-all error 
probability so infinitesimal that it may safely be neglected. 

Two-Way Strategies 

In an operative sequential decoding system it is, of course, not sufficient 
that the error probability be negligibly small; provision must also be 
made to start decoding again automatically after each overflow. In a 
one-way communication system this can be arranged by periodically 
interrupting the message input bit stream to the transmitter’s encoder, 
say after every block of L bits, and arbitrarily inserting K zeros. The 
decoder can then always resynchronize within L data bits after an overflow 
and thus continue on with its work by discarding the undecodable block. 

On the one hand, for K = 100, F = 10~ 6 , T = 10 4 , and L = 10 3 such 
a one-way strategy implies only a 10% reduction in effective transmitted 
data rate — that is, reduction by the factor (L — K)/L. But on the other 
hand, from the union bound there is a probability of approximately! 
LF - 10" 3 that all or part of each block of L — 10 3 message bits will be 
discarded, hence not decoded at all, even though blocks that are decoded 
are almost certainly correct. In many applications an operating 
characteristic of this type would not be acceptable. 

When communication is two-way, a more attractive remedy to the over- 
flow problem is available. A few “service” bits can be inserted into each 
of the two data streams at specified intervals, as indicated in Fig. 6.59. 
The service bits originating at terminal A inform terminal B whether 
decoding at A has been stopped because of overflow. If so, terminal B 
retransmits the undecodable message. If not, terminal B continues with 
new traffic. Each terminal follows an identical strategy. 

The crucial aspect of such a two-way system is that the service bits are 
themselves encoded. Thus even when both channels are noisy the prob- 
ability that an instruction will be misinterpreted equals the probability 

f This estimate of the overflow probability neglects the fact that the procedure introduces 
a statistical dependence between overflows: when resynchronizing after an overflow 
on one block of L digits, the decoder’s depth of search pointer is set initially to the 
beginning of the succeeding block rather than all the way back to the input end of the 
decoder memory. 
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of decoding error and may largely be ignored. There remains, however, 
the probability that a service bit will not be decoded at all — and since 
this is related to the overflow probability it may not be ignored. 

The difficulty is resolved by adopting a fail-safe strategy wherein each 
terminal when confronted with an undecodable block always acts exactly 
as if it had in fact decoded a request for retransmission. Matters can be 
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Figure 6.59 (a) A two-way communication system. The transmitter buffers store the 
encoder input data streams for possible retransmission. It is assumed that the input 
message at each terminal is available on demand. ( b ) Structure of encoder input data 
stream. The shaded intervals represent service bits; the open intervals represent 
customer traffic. 


arranged 88 so that the entire data stream at each encoder input is decoded 
in proper sequence and without block elisions, provided that undetectable 
errors do not occur. Thus the one-way problem of undecodable blocks 
is circumvented. The major incremental cost incurred is the provision 
at each transmitter of a slow speed memory in which to store the message 
input traffic over a time interval equal to the combined round-trip 
propagation and data-processing time, say T. 

Each repeat-request involves the “loss” (for communication purposes) 
of the combined round-trip time; that is, the transmitter re-encodes all 
input data that it has transmitted during the preceding T-sec interval. 
Since the relative frequency, of repeat-requests is F, the average fraction 
of time remaining is (1 — TF). As long as F can be made small enough 
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that TF < 0.1, the effective rate at which data is communicated (in 
bits per second) is not seriously reduced by overflows. A safeguard 
against undetectable service-bit errors can be provided by periodic 
resynchronization after every 10 6 or 10 7 transmitted bits. 

The need for a two-way system appears to be unavoidable if com- 
munication that is both accurate and efficient is to be maintained over 
actual communication channels. The parameters ot most channels are 
time-variant, so that their reliability functions fluctuate. A system without 
feedback must be designed to operate reliably at a data rate commensurate 
with the worst channel condition, which is inconsistent with efficiency 
when conditions are good. With a two-way strategy, however, the service 
bits may also be used to request changes in transmitted data rate. An 
example of such a system is discussed in the next section. 

Experimental System Design 

In Chapters 5 and 6 we have been concerned with the relationship 
between modulation-demodulation and coding-decoding on the one hand 
and system performance/complexity on the other. For block codes with 
maximum likelihood decoding the interrelations (excluding considerations 
of equipment complexity) are evidenced in the bound 

PjT] < 2- A7fjRo '-- RN] , (6.118a) 

in which N = K is the code constraint length in data bits, and i? N 

is the data rate in bits per transmitted symbol. 

With bandlimited channels, we are usually concerned with data rate in 
bits per second. If D denotes the number of orthogonal building- block 
signals that can be propagated over the channel per second, Eq. 6.118a 
can be rewritten in the form 

P[g] < 2~ 2 ' ti)Ro '-- R] , (6.118b) 

in which R = DR H is the data rate in bits per second and T — KjR is the 
block duration in seconds. Equation 6.118b implies that an arbitrarily 
small error probability can be obtained by making T large enough, pro- 
vided, of course, that R < DR 0 '. In order to communicate at a high data 
rate, we wish to design the modulation-demodulation system in such a 
way that DR 0 ' is large. 

When we consider sequential decoding and include decoder complexity 
in the formulation of a two-way-system design problem, the interrelation- 
ships are somewhat modified. By choosing T large enough, we obtain as 
small a decoded probability of error as desired; it is sensible to choose T 
large enough so that the probability of error is truly negligible. The 
primary design problem is to make TF small, where as before T is the 
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combined round-trip propagation and data-processing time and F is the 
relative frequency of overflows. 

In this connection also it is the value of DR 0 ' that determines the 
allowable data rate. Since 


Rm __ R 
Rq DRq 


(6.119) 


the frequency of overflow estimate of Eq. 6.117 can be rewritten in the 
form 

p ^ (6 120 ) 

Thus the value of /IP required to obtain a stated value of F decreases as 
DRq is increased with R held fixed. 

When sequential decoding is to be used, the value of DR 0 ' is a funda- 
mental measure of the effectiveness of the modulation-demodulation 
scheme that produces it. An appropriate design philosophy is to maximize 
DR 0 ', subject to constraints on the complexity of instrumentation. For 
the simple case of a bandlimited additive white Gaussian noise channel 
we have seen in Chapter 5 that there are unavoidable theoretical limits to 
the maximum achievable values of both D and R 0 ’. The complexity of 
instrumentation will increase if we attempt to push either of these param- 
eters too close to its theoretical limit. One must settle for a design that 
approaches the limit of diminishing returns. 

The maximization of DR 0 ' with actual communication channels will 
usually be complicated by the lack of an adequate mathematical model 
of the channel disturbance. An experimental investigation of how one 
might proceed has been made in connection with a toll-quality long- 
distance voice telephone line. The experiment, conducted by the MIT 
Lincoln Laboratory, 54 provides an instructive example of the interplay 
between the various design factors introduced in Chapters 5 and 6. 


Intersymbol interference. With toll telephone lines, the main disturb- 
ance in propagation is not additive Gaussian noise but intersymbol inter- 
ference: if we try to transmit a narrow pulse such as that shown in Fig. 
6.60a, we actually receive a smeared pulse of longer duration such as that 
shown in Fig. 6.6 06.f The smearing is primarily attributable to the 
fact that the phase of the telephone-line transfer function is not a linear 
function of frequency. The result is that different frequency groups in the 
transmitted spectrum propagate with different velocities and therefore 
arrive at the receiver with different delays. Although the 3-db amplitude 


•j - In this discussion we presume that the telephone line is terminated in such a way 
that the input and output signals have low-pass spectra. How this may be accomplished 
will be studied in Chapter 7. 
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bandwidth of the telephone channel used in the experiment was 3.4 kc 5 the 
unequalized bandwidth affording delay variations of less than ±£msec 
was only 1.9 kc. 

Intersymbol interference may be controlled to some extent by careful 
phase equalization of the line and careful shaping of the transmitted pulse. 


Transmitted pulse 



Received pulse 



Figure 6.60 Example of the response of a telephone line to a short pulse. 


Even when this is done, however, an unavoidable increase in the residual 
intersymbol interference occurs as the pulse duration is narrowed and the 
pulse repetition rate increased. If we treat this interference as noise, it 
follows that making D larger decreases R 0 '. We cannot maximize D and 
R 0 ' separately. 

Signaling alphabets. Toll-quality telephone lines are normally char- 
acterized by a high signal-to-noise ratio. Given adequate suppression of 
intersymbol interference, we therefore anticipate (from Fig. 5.17) that the 
transmitter alphabet should provide many more than two amplitude 
levels if the bound parameter R 0 afforded by an unquantized receiver is to 
be maximized. We also anticipate from Fig. 6.26 that the quantized 
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parameter R 0 ' should be substantially equivalent to R 0 if the receiver 
output samples are quantized in such a way that the multiamplitude trans- 
mitter and decoder alphabets are the same. The problem of alphabet 
design then reduces to the determination (as a function of D) of how many 
signal amplitude levels should be used. 

For alternative choices of D and number of amplitude levels, A, 
empirical estimates of the resulting transition probabilities between letters 
of the input and output alphabets can be made from experimental 
measurements. An optimal design procedure is then to calculate cor- 
responding values of R 0 ' from Eq. 6.62 and to choose that A and D for 
which DR 0 ' is maximum. 

Decoding distance. We now consider the determination of a suitable 
distance function for the decoder to use in testing possible transmitted 
signal hypotheses against the received data. For Gaussian noise we would 
use Euclidean distance and for BSC noise we would use Hamming distance. 
In each case the choice is dictated first by the optimality of using a distance 
measure that is monotonically related to a posteriori probability and 
second by the relative ease of decoder instrumentation. 

In our telephone-line experiment a decoder that used the empirical 
estimates of the transition probabilities to compute the a posteriori 
probability of the received signal, given any transmitted signal hypothesis, 
would be most desirable from the point of view of performance. The value 
of R 0 ' would then be that given by Eq. 6.62, and in principle we could even 
contemplate designing the coder to use the transmitter alphabet letters in 
proportions that maximize R Q '. In practice, however, the implementation 
of such a coder and decoder would be extremely difficult. We seek an 
acceptable engineering compromise instead. 

Fortunately, it is not necessary to use an optimum system in order to 
obtain good results ; in engineering design one seeks not so much to be 
optimum as to avoid crippling nonoptimalities. In the particular cascade 
of modulator/telephone line/demodulator with which we are concerned, 
the dominant characteristic of the over-all disturbance is that small errors 
in received amplitude level are much more prevalent than large ones. We 
therefore anticipate that an appropriate, albeit non-optimum, decoding 
distance function might be the cumulative sum of the absolute voltage 
difference between received and hypothesized signals. For example, if the 
quantized received pulse amplitudes are 

r' = 0V» r z> • ■ ■ » r N ') (6.121a) 

and the signal hypothesis is 

s* = s i2 , . . . , s iN ), (6.121b) 
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for decoding purposes we may define the distance between r' and s } - as 

4 A fir/-*, |. (6121C) 

3=1 

The distance function of Eq. 6.121c is monotonically related to the 
logarithm of the a posteriori probability when all signals are equally likely 
and the channel disturbance is an additive noise vector n = (n 1; n 2 , . . . «.v) , 

with statistically independent, exponentially distributed components; 
that is, when ; 

Pn ( a ) = | e -*l«l ; j = 1, 2, . . . , N, b > 0, (6.122a) 

and v 

f .=nfv (6 - 122b) 

3=1 

Of course, our cascade of modulator /telephone line/demodulator cannot 
be described precisely in this way. The discrepancy, however, was not 
too great for the several different choices of multiamplitude signaling 
alphabets and values of D tested in the experiment. Most important of all, 
the distance function of Eq. 6.121c has the advantage of being easy to 
implement in a decoder, and was therefore adopted. | 

Experimental results. Once a suitable decoding distance function is || 

adopted, a convenient and reasonable compromise method for estimating 
iV is to fit the empirical data to the density function appropriate to that |j 

distance function rather than to insert the empirical data directly into 
Eq. 6.62. (In general, less empirical data is necessary.) In the telephone- 
line experiment the exponential density function of Eq. 6.122 was used, 
and the parameter b was adjusted as a function of D and the alphabet size 
to give the best match to the corresponding experimental data, fhe 
resulting set of exponential density functions was then used to calculate an 
estimate of the actual error exponent DK 0 ' f° r cac ^ choice of D and number 
of signaling levels. With the pulse waveshape and phase-equalization 
techniques used in the experiment,! the estimate of £>/?„' was found to be 
maximum for D = 3000 pulses/scc and a signaling alphabet comprising 
32 different amplitude levels, and these choices were adopted. At this 
maximum the relative frequency of receiving the same signal level as the 

t The pulse waveshape used in the experiment was adjusted to minimize intersymbol 
interference for each value of D by observation of the received waveform. This was 
possible because both ends of the telephone Ime terminated at the same laboratory 
bench Other techniques of interference suppression would be necessary in practical 
applications. See for example F. K. Becker ct a!., “Automatic Equalization for Digital 
Communication ” /W. If. EE, 53, No. I, 96 97, January 1965. 


one actually transmitted was only 0.9. A modulation-demodulation 
system suitable for use in a coded communication system is not necessarily 
suitable for a system in which coding is not used, and conversely. 

The convolutional coder used in the telephone-line experiment had a 
constraint length of K = 60. It could operate at three data rates: i? N = 



Figure 6.61 Simplified block diagram of telephone line experiment. 


■|, and |. The binary output sequence from the coder was framed into 
groups of 5 digits, each of which corresponded to 1, 2, or 3 input bits, the 
number depending on R u . Each successive group of 5 digits was fed in 
parallel to a binary-analog converter and specified one of 32 possible 
amplitudes as output. The resulting sequence of voltage levels was used 
to modulate successive signal pulses. 

At the receiver the procedure was reversed. A block diagram of the 
over-all system is shown in Fig. 6.61 . The received signal r(t) was sampled 
in synchronism with the transmitted pulses, and the resulting sequence of 
samples was fed to an analog-binary converter. Each analog sample was 
quantized into one of 32 levels, and the 5 resulting binary digits were 
passed in parallel into a storage buffer. 

The sequential decoder used in this experiment antedated the Fano 
search algorithm, and the buffer was separate from the decoder itself. As 
each successive 5-digit data group was decoded and shifted out of the 
decoder r-register the oldest data group stored in the buffer was shifted 
in. The buffer and r-register contained 3000 and 300 bits, respectively. 
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The data rate at which the equipment operated was controlled by 
the decoder. Whenever decoding was too difficult or buffer overflows 
occurred, the decoder automatically sent a service bit to the encoder 
requesting a decrease in data rate. Conversely, whenever the decoding 
was too easy, the decoder automatically requested an increase in i? N . The 
data rate was observed to fluctuate primarily between i? N = f and 
r h = with occasional reductions to R N = J. Rate changes occurred 
with an average frequency of one in 6 or 7 sec. 

The entire apparatus was operated for a total of 40 hr, spread over all 
times of day. The average data rate over the period of operation was 
approximately 7500 decoded bits/sec. More than 10 9 message bits were 
decoded before the first decoding error was made, and this error was 
detected before the next few decoded digits had been released. Since the 
delay between bit decoding and bit release was only K bits, this decoding 
error would not have been made if an additional 2K-b\t delay had been 
added onto the decoder ^-register. By way of comparison, high quality 
conventional telephone-line data-communication equipment operates 
without coding over comparable channels at approximately 2400 bits/sec 
with an approximate error probability of 10~ 5 . 

Coding seems most appropriate on channels over which good trans- 
mission is expensive (or impossible) to obtain ; the experiment described 
above was performed on a telephone line primarily because of the relative 
ease of experimentation. Nevertheless, the design procedure used affords 
insight into the compromises that arise in the engineering of coded com- 
munication systems. 

APPENDIX 6A EXTENSION AND ANALYSIS 
OF THE FANO ALGORITHM 

In this appendix we obtain upper bounds on quantities that estimate the 
average number of computations and the probability of error for the Fano 
sequential decoding algorithm. Before doing so, it will be useful to observe 
that the algorithm may be applied with no essential change to tree codes 
in which u, rather than 2, branches diverge from each node and to channels 
far more general than the BSC. We limit attention here to discrete memory- 
less channels (see Fig. 6.17) with A input letters («J, Q output letters 
{&,•}, and transition probabilities {q^} (i = 1, 2, . . . , A; j — 1, 2, . . . , Q). 

We shall consider convolutional codes in which each branch of the code 
tree encompasses v channel input letters. The rate, for a tree with u 
branches per node and v channel symbols per branch is defined as 

D A 1, 

R N = - log 2 u 
V 


bits per channel symbol, 


(6A.1) 
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which is a generalization of Eq. 6.82. The code constraint length, K, is 
still defined as the span of code tree branches affected by any one coder 
input symbol. Examples of the type of code tree to be considered are 
shown in Fig. 6A.1. 

To adapt the Fano decoding algorithm to these more general channels 
and tree codes, it is only necessary to generalize the tilted distance function 
of Eq. 6.109. Letting y k * denote the kth symbol of a hypothesized trans- 
mitted sequence Iv channel input symbols long and letting r k ' denote the 


r'= r\ r % rz , n' r 5 ' re' , r'- r\ , r 2 ' , ••• 

“8 


j- 

04 01 a 2 

03 

03 

ai 03 a 2 



a 5 



ai 04 03 


05 




a 5 

1 ae 0 



ai 03 a 2 


0 


a \ 0404 



01 ■ 



04 04 a 1 

04 

fl2 0 




07 ..-0 


(a) (b) 

Figure 6A.1 Sections of generalized tree codes and corresponding segment of the 
received signal sequence r': (a) channel input alphabet {tfj., a 3 , o 4 }, " = 2, v = 3, 
= J; (b) channel input alphabet . . - , n 3 i> u = 3, v = 1, Kn = Io gs 




corresponding channel output symbol, we redefine (usefully, as we shall 
see) the tilted distance /(/) as 

1 ( 1 ) = £ (6A.2) 

*“ 1 

where 

2 , ^ -logJ^lJ T K*. (6A.3) 

- fM - 

Here /(/',') serves as a scaling factor with value specified when r k ' is />. by 

• /(/’,) =/ ; ^ 2 p,<l ri ; j = \,2, ... ,Q, (6A.4) 

.. [ 

in which {/>,} is a set of non-negative numbers that sum to I. We shall 
soon introduce a random coding argument in which >/»,} is the set of 
probabilities with which channel input letters are assigned to codewords. 
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In this case fj is identified as the a priori probability of the channel output 
letter Z> 3 -. Note that 

2 fi = 1- (6A.5) 

3=1 

Error Probability 

Extension of the Fano algorithm to the class of situations considered 
here affects neither the flow diagrams of Figs. 6.44 and 6.46 nor the decoder 
block diagram structure of Fig. 6.48, although each /-register stage must 
now be able to store any one of the Q letters {&,-}. As mentioned in 
Section 6.5, however, analysis of an actual decoder is complicated by the 
possibility that the algorithm may require a search back into the code tree 
of greater depth than the size of the /--register— in conjuncture with the 
data arrival rate and computational speed of the decoder — can accom- 
modate. We obviate the difficulty by assuming for purposes of analysis 
that the /--register is infinite. 

A second complication arises from statistical dependencies due to the 
sliding constraint structure of a convolutional code. We obviate this 
difficulty by resorting to an analytical trick with which we are already 
familiar. Specifically, we consider an error event 8 fe analogous to the 
event denoted by the same symbol in the suboptimum decoder analysis of 
Section 6.3. We again define the hth starting node as the code-tree node 
designated by decoder decisions A x , x 2 , . . . , A h _ v In defining S A we assume 
that a magic genie positions the search node pointer on the correct hth 
starting node and that the Fano algorithm is constrained to act as if this 
node were the origin of the code tree. Thus the decoder is allowed to search 
forward , but never backward, from the starting node. We then define S A as 
the event that decision x h , determined at the instant when the search node 
pointer first reaches any one of the u K nodes that lie K branches beyond 
the hth starting node, is incorrect. We shall bound the mean probability, 
P[£ ft ], of the event S 7( over an ensemble of convolutional codes. The 
significance of the bound with respect to an actual decoder is discussed at 
the end of this section. 

Labeling of nodes in the incorrect subset for x h . The derivations 
that follow specialize a method of analysis devised and used by Stiglitz 
and Yudkin to obtain more general results.! The derivations are sim- 
plified by introducing additional notation. We first define the incorrect 
subset for x h as the set of all nodes connected to the correct Mh starting 
node via one of the w — 1 incorrect branches diverging therefrom, plus 

f I. G. Stiglitz, and H. L. Yudkin, Probabilistic Decoding, to be published. 
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the starting node itself. (See Fig. 6A.2.) We say that a node in the in- 
correct subset for x n is at depth /if it is connected to the hth starting node 
by a path containing £ branches. Thus there is 1 node (the starting node) 
at depth 0; u — 1 nodes at depth 1 ; (u — 1 )u at depth 2; and (u — 1)«^ _1 
at depth /, all / > 0. We observe from Eq. 6A. 1 that the number of nodes, 
say Ng, in the incorrect subset at depth £ satisfies 

Ng < 2 e * R *; £ = 0, 1, (6 A. 6) 

It will be convenient to label nodes in the incorrect subset for x h by an 
ordered pair of integers (/, /?). The number £ indicates the depth of the 
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Figure 6A.2 Section of incorrect subset for#*. If the heavy line indicates the correct 
(transmitted) path, the incorrect subset for the decision x h consists of all nodes outside 
the dashed line. These nodes are labeled as shown. 


node; the number n indicates the node’s vertical position in the code tree 
relative to all Ng nodes of depth £. The labeling is illustrated in Fig. 6 A .2. 

Bound on P[€J. For any particular code and transmitted path we 
define P[S ft ] as the conditional probability that x h is incorrect at the instant 
that the algorithm first reaches a search node K branches beyond the Mh 
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starting node, given that the genie initially positions the search node 
pointer on the correct starting node and that searching behind the starting 
node is prohibited. We observe from Fig. 6A.3 that the starting con- 
ditions — which we always assume hereafter — imply that the event S A 
cannot occur unless some node at depth K in the incorrect subset for x h 
has /-value satisfying the lowest threshold, say T c , simultaneously satisfied 
by all nodes along the segment of the correct path extending K branches 
beyond the Ath starting node. Otherwise, the algorithm requires the 
decoder to trace out the correct path in preference to any of these in- 
correct paths. 



Figure 6A.3 Details for bounding P[S A ]. The decoder cannot follow the path to the 
node labeled ( K , 3) without violating T c , so that this node cannot cause an error. On the 
other hand, the decoder can reach node ( K , 7) without violating T c , so that ( K , 7) may 
cause an error. Whether an error is caused by a node ( K , ri) with /-value < T c depends 
on the ordering of the tree search. If the path connecting such a node to the /zth starting 
node everywhere satisfies threshold T c — A, an error is made. 


To write this observation mathematically, we first define w(tf, ri) as the 
difference between the /-value of node (/, n) and the maximum /-value 
along the if-branch extension of the correct path. Let H° denote the 
branch depth at which this maximum occurs, let z' k denote the Ath in- 
crement to the /-value of node ( l , n), and let z° k denote the Ath increment 
to the /-value along the correct path. From Eq. 6A.2, 


w(/f, n ) = 




tv II°v 

~ 2 Z k ~ 2 Z k > 
i-=l " • k=l 



(6A.7) 
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in which t h is the /-value of the starting node and the index k counts 
symbols to the right of this node. The range of H° is 0 < H° < K. (If 
H° = 0, the second sum on the right-hand side of Eq. 6A.7 is identically 
zero.) 

It is clear from Fig. 6A.3 that one of the N K incorrect nodes at depth 
K, say node (K, n), cannot satisfy T c unless w(K, ri) < A. Invoking the 
familiar union bound, we have 

PfoJ < P[w(K, 1) < A or w(K, 2) < A or - • • or w(K, N K ) < A] 

Ne 

< t P MK, n) < A]. (6A.8) 

n=l 

Bounding P[vr(/, //) < A], As a preliminary to overbounding the 
right-hand side of Eq. 6A.8, we account for ignorance of the specific value 
of H° in the definition of w(/ t n), Eq. 6A.7, by the following argument. 
Let w(t, n, H) be the value of w{f, n) in Eq. 6A.7 when H° is replaced 
by H, 0 < H < K. Then 

P[w(/, n) < A] = P n, 0) < A or w(/, n, 1) < A or 

... or >v(/,«,/0<A]<f (6A.9a) 

H-o 

in which we define 

P H = PW/,a )<A ]. (6A.9b) 

An upper bound on P H can be obtained by using two techniques, 

Chernoff bounding and random coding, with which we are already 
familiar. Letting/( ) denote the unit step function (as in Eq. 6.55a and 
sequel), we observe from Eq. 6A.7 (with H° replaced by H) that 

P H =/[A — w(4 H, H)] 

< exp [2{A — w(/f n, H)}] 


r i Cv 

Hv \ "I 



exp -2 2 K ' 
L \k—l 

~I4j ; 

/ j 

2 > 0. 

(6A.10) 


The expectation in Eq. 6A.10 is over the ensemble of channel noise 
sequences. To simplify the expression further we use random coding and 
evaluate the mean, P Iiy of P H over an appropriate ensemble of codes. 
In particular, we consider generalized convolutional encoders like that in 
Fig. 6A.4 and the ensemble of codes obtained when each lead of the 
connection matrix is statistically independent and equally likely to have 
weight 0, 1, . . . , u — !. We assume that each coder in the ensemble uses 
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each message 
input symbol 

Figure 6A.4 Generalized convolutional coder. The input message x has components 
{a; A } which can be any one of a set of u symbols. The input transducer identifies each x h 
in turn and applies a 1 to the /cth binary shift register if and only if x„ is the Ath symbol 
in the set. Each of the Ku shift-register stages is connected to each of the v' modulo-w 
adders by a line that multiplies the stage’s content by a number which can be preassigned 
independently as any one of the integers 0, 1, — 1. Thus each x h produces a 
sequence of v' integers at the commutator output. The output transducer breaks the 
sequence into groups of length v'jv and maps each group into one of the channel input 
letters {a,}, as described in connection with Fig. 6.8. The number of adders, v', is taken 
to be large enough so that the {«,} can be generated with the desired probabilities {/'.}. 
(A more efficient coder design utilizes only fi K- stage shift registers, log 2 u < fi < 1 + 
log 2 «.) 

an identical transducer, so designed that the channel input letters {a,} are 
assigned with probabilities {/?*•}, i = 1,2 A. For this ensemble the 
correct path and any incorrect subset path to depth / < K are statistically 
independent. Moreover, by letting y k denote the &:th symbol along the 
correct path, r k denote the &th received symbol and y' k denote the kth 
symbol along the (incorrect) path leading to node (/, n), we have 

P[*A- = r k = b p y' = a h ] = p^Pn, i, h = 1, 2, . . . , A 

; = 1,2,...,<2, (6A.11) 
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independently for each value of k. Taking an overhead bar (now and 
hereafter) to mean expectation also with respect to the specified code 
ensemble yields 


~ I tv Hv \ 

P H < exp -2 1 2 z; - 1 4) 


e AA n exp (2(4 - 4)} n exp (- 24 ) ; H < K 

U=1 J U=Hi>+l 


exp (2(4 - 4)}J 11^ exp (24)J ; H > tf. (6A.12) 

In writing Eq. 6A.12 in factored form,- we have exploited the fact that 
the probability assignment of Eq. 6A.11 is made with statistical inde- 
pendence for different k. 

Evaluation of factors. Evaluation of the individual averages in Eq. 
6A.12 is straight-forward. By using Eq. 6A.11 and the generalized tilted 
distance definition of Eq. 6A.3, we have 

exp [2(4 - 4)] = 222 PAuPh explW— log 2 ~ + log 2 ^) . 

i=i *=u=i L \ Jj f j / J 

Setting 2 = \ log„ 2 (which is positive), a choice which simplifies the 
analysis, yields 


exp [2(4 - 4)] = H2 PiQaPh — ) 

i i n \q i J 

=i[ip^X 

= 2- Ro '; 2 = \ log e 2, (6A.13) 

where R' 0 is the exponential bound parameter defined in Eq. 6.62b. 

The remaining factors in Eq. 6A.12 are evaluated in much the same way. 
First 

exp [ 24 ] = 2 2 Mo- exp 2(-log 2 ^ + K N ) 

z=i3-=i L v Jj 

= 2 ii? N ^ 2 Pisl^a y/L ; x = i lo §« 2 - 

i j 

A straightforward extension of the Schwarz inequality, Eq. 4.64, to 
multidimensional vectors a and b yields 

q r Q l A r Q l' A 


2«A < 2 fl * 2^ 
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exp [K] = 2 1r n 1 V/J 2 

3=1 Li=l 

r 0 -I'/srO / i \ 2 1 ^ 

< 2- rn i/ 3 - iixiw^j 

-i=i J Li=i \!=i / J 

= (6 A. 14) 

in which we have used Eq. 6A.5 to equate 2i./i = 1* 

The last factors in Eq. 6A.12 are evaluated with the help of Eq. 6A.4. 
We have 


exp [-Az'] = 12 X Wa exp A log,** - R H 


= 2- iR N XX/^^j 
= 2- 4R N 2 V7; [x W^] ’ A = 4 l0 g* 2 - 

Application of the Schwarz inequality yields 

exp [-Az'] < 2~ hlR H +Ro '\ (6A.15) 

Recombination, The desired bound on P[S A ] follows from a series of 
substitutions. First, substituting Eqs. 6A. 13-15 into Eq. 6A.12 (with' 
1 = \ log c 2), we have 

f 2- A [2 — ■ K °'] /?l ’[2 — - ( ; H < / < K 
P ■<! I 

H N [2 4A [2 -Ro ’]^[2 i<KN “ Ro> ] Hl ^ 1 '; H > £. 

Hence 

F /r < 2 u 2“^ ,,cb ® /+r n12- ih,,[R »'-- r n] ; all H, / < 1C (6A.16) 

By averaging Eq. 6A.9 over the ensemble of codes and substituting Eq. 
6A.16 we obtain 

P[w(/, n) < A] < 2 iA 2 “* ^Ro'+Rn} 2 2 _4r, ' IRo,_r n 1 . 

o 

The bound is weakened by extending the sum to infinity. Thus 

P[w>(/, n) < A] < A 0 2~ i£vlRo ' +R n ] ; i? N < R 0 ', (6A.17a) 

in which the coefficient >4 0 is defined as 


•-viA 

. A. ± 

^ __ 2-^[Ko’-« n 3 ' 


(6A.17b) 
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Setting / = K and substituting in Eq. 6A.8 yields the final result, 

P[£J < N K A 0 2- bK * R «' +R ^; R N < R'. 

By virtue of Eq. 6A.6 this may be written 

P[£J < A 0 2~- 7C ® [B#, - r nJ ; R n < R'. (6 A. 18) 

We find that the ensemble average probability of the event “% incorrect 
when the tree search first penetrates K branches beyond the (correct) 
starting node” decays exponentially with increasing code constraint 
length K, provided R N < R'. Stiglitz and Yudkin use more sensitive 
arguments to derive tighter bounds. 

Interpretation. Relating the bound on P[£J to the error probability of 
an actual Fano decoder involves a series of arguments. As with the 
suboptimum decoder of Section 6.3, we presume that a long sequence 
x = (x lt x 2 , , x L ) of coder input symbols is to be communicated and 
count an error unless all L symbols are decoded correctly. Because the 
bound of Eq. 6A.18 is independent of h and the transmitted path, when the 
genie-aided decoder is used to decode the entire sequence one symbol 
after another the mean error probability is overbounded by Z-PpQ. Thus 
the error probability can be made arbitrarily small by appropriate choice 
of K and R N . For ease of reference we call the genie-aided decoder D r . 

Now consider a second decoder, say D n , identical to D r except that 
the hih starting node, h = 1, 2 is determined by the preceding 
decoder decisions &>, . . . , A h _ x rather than by the genie. Because D n 
determines x correctly if and only if does, LP[£ ; J is an upper bound on 
the error performance of D n even though the genie has been dismissed.! 

The next argument is more subtle. Consider a third decoder, say D m , 
which like Dj and D n has an infinite r-register but which uses a search 
algorithm more akin to that of an actual machine. In particular, for D ni 
the only modification of the Fano algorithm is that we prohibit the change 

of any decision x h , h — 1,2 L, once the search node pointer has 

first penetrated to a node / = h + K — 1 branches deep in the code tree. 
Thus D m maintains a running record of its path of deepest penetration 
to date and at any instant treats the node K — 1 branches back from the 
point of deepest penetration as if it were the code origin. Decoders II and 
III differ in that III does not return to the next starting node after each 
decision is made. We shall see, however, that these two decoders always 
make the same decisions, even when they decode incorrectly. 

t The role of the genie in the derivation is to avoid the necessity of conditioning the 
ensemble used in analyzing P[S A ] on the event x it . . . , x h _ l correct.” Specification 
of the probability assignment for the conditional ensemble would be difficult, if not 
impossible. 
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The equivalence of D m and D n may be demonstrated with the help of 
Fig. 6A.5. Consider D n ; let node 1 be its starting node and let node 3 be 
its “decision node” for x h . By the decision node for x h we mean the terminus 
of the first path constructed by D n extending K branches beyond its /zth 
starting node. We therefore identify node 2 as D ir ’s starting node for 
<£ a . In the process of determining x h , D ir may need to pass through 
node 2 many times, with successively higher values of the running threshold 
T. The important point, however, is that the value of T required to reach 
node 3 depends solely on /(/) along the path from node 2 to node 3 and is 
independent of the l-value of node 1. Moreover, node 3 is the first node 
reached by D n at depth / = h + K — 1 . 


t(D 



In the process of determining the next decision # A+1 , D n first starts 
at node 2 and again moves forward with running thresholds that are 
independent of the I-value of node 1 . It follows from careful consideration 
of the Fano algorithm that D„ must again arrive at node 3 before reaching 
any other node at depth l = h + K — 1 . In addition, D ir arrives at node 3 
the second time with T and 0 having the same values they had on the first 
arrival. [These facts, of course, do not preclude subsequent retrogression 
before the decision node for % +1 , say node 4 in Fig. 6A.5, is finally reached.] 
The equivalence of decoders II and III is immediate. Start with the 
determination of x v Both decoders begin at the code origin and simul- 
taneously reach the same decision node (for aq) with the same value of T. 
Now let D IX move back to the starting node for x 2 , holding D m immobile 
until D n rejoins it. Then release D m and again allow it to move in syn- 
chronism with D XI to a decision node for x 2 . Since both decoders follow 
the same rules and depart from the same node with the same initial con- 
ditions, this decision node and all movements in reaching it are also 
common. Continuing in this way, we see that the operations performed 
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by D n and D ni are identical, except that Du must redo many operations 
which D in elides. Thus the mean error probability for D m is also over- 
bounded by £P[8 ft ] and can be made arbitrarily small. 

The remaining task is to relate P[8 ft ] to an actual decoder. We first 
note that extension of the ^-register beyond the end of the /--register, as in 
Fig. 6.58, guarantees that the actual decoder — like D IrI — will not release 
any x h until the search node pointer has reached a node (at least) ^branches 
deeper in the code tree. The significance of P[£J arises from the fact that 
an actual decoder exactly duplicates the movements of D ni whenever two 
conditions are met. The first condition is that the actual machine deter- 
mines all L of the x h without incurring a search K — 1 or more branches 
back from the node of deepest penetration; the second is that the (finite) 
/•-register does not overflow. Since the number of branches typically 
observed by a decoder in a deep search is enormous, we anticipate that 
the second condition will dominate in practice. We therefore interpret 
the bound on P[8J as a conservative estimate of the mean probability of 
undetected error per message symbol decoded. The estimate is con- 
servative because it neglects the likelihood that overflow will occur before 
an erroneous decision is released at the ^-register output. When the re- 
register extension is greater than K, as in Fig. 6.58, we expect the estimate 
to be exceedingly conservative. 

Mean Computation 

We now bound the mean number of computations (over the ensemble 
of convolutional codes) performed by decoder Zfi in decoding x 1} x 2 , . . . x L . 
Since the analysis discounts repetitive computation in retracing from each 
genie-specified starting node to the first node encountered K — 1 branches 
beyond it, the result estimates computations performed by D m when D l 
decodes all L of the {re /4 } correctly, hence estimates the computations 
performed by an actual decoder in the absence of overflow. 

We define computation by saying that the decoder performs one com- 
putation each time it enters either the “look forward” or the “look back” 
box in the flow diagram of Fig. 6.46. In particular, we define B h as the 
total mean number of times the decoder enters either box with its search 
node pointer on nodes in the incorrect subset for x h . We observe from 
Fig. 6A.2 that every node in the code tree belongs to the incorrect subset 
for one and only one x h , h = 1, 2 , ,L. Thus the mean number of 
computations per decoded symbol, say B, is given by 

B=±is h - (6A.19) 

L) i=i 
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We overbound the {B k }, hence B, by exploiting two basic properties of 
the search algorithm, (i) At most u + 1 computations can be performed 
with the search node pointer on a given node and with a given threshold 
in force, (ii) The maximum number of different thresholds ever used 
with the pointer on a node in the incorrect subset for £ h is equal to the 
number of thresholds lying on or above the /-value of that node but below 
T c + A, where T c has been defined in Fig. 6A.3. 

Property (i) is true because, with a given threshold in force, only one 
computation may be performed in looking ahead on each of the u branches 
diverging from the node and only one computation may be performed in 
looking back from the node. Indeed, the function of the variable 6 in the 



search algorithm is to preclude the possibility of looking along any branch 
more than once with the same threshold; were this not so, the decoder 
might enter an endless loop. 

Property (ii) is true because the search node pointer cannot move to any 
node unless the running threshold is satisfied by the node’s /-value and 
because the algorithm never uses a threshold higher than T e while searching 
the incorrect subset for x h . 

In bounding B h , we first consider a particular code and transmitted 
path and use these two properties to note that the number of computa- 
tions— say B(tf, »)— performed on node (/, «) of the incorrect subset for 
x h is bounded by 

Btf, «)< (m + «)]. (6 A. 20) 

Here the staircase function g( ), plotted in Fig. 6A.6, bounds the max- 
imum number of thresholds that can be in force when (/, ri) is the search 
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node. Observing that 


*r(«) = 2/(- «-7A), 


(6A.21) 


where /( ) is the unit step function used in the derivation of Eq. 6A.10, 
we have 


B{(, «) < (ii + 1) 2 /[ - w(£ n) - A). 


(6A.22) 


Taking the expectation of both sides of this inequality with respect to the 
probability assignment of Eq. 6A.11, we obtain 


B(f, n) < (u + 1) 2 P[w(/, it) < -;A], 


(6A.23) 


We now observe from its derivation that the validity of the bound of 
Eq. 6A.17 does not depend on the specific value assigned to the parameter 
A. Substituting — y'A for A in Eq. 6A.17 and applying the result to Eq. 
6A.23, we obtain for / < K 


B(t, n ) < ^2-^tK o'+R N ] J 2 “ 1m 

H * =_1 

_ ~ 2 - -A[2?(/+iiNL 

1 1 - 2" JA 


jRn <1 Rq, 


(6 A. 24a) 


in which the coefficient A x is defined as 

^ 4 \ _ ■ (« A - 24b ) 

The bound is minimized by choosing A, the threshold spacing, equal to 2. 
With this choice 

£(/, n ) < 44 I 2-^ [ft °'+*N ] ; r h < (6A.25) 

The mean number of computations B h is, by definition of decoder D r , 
the mean of the total nonrepetitive computations performed on nodes of 
the incorrect subset for x h at depths less than or equal to K. Since the mean 
of a sum is the sum of the means, 


4 = 2 2 *«»)• 

Thus, again by virtue of Eq. 6A.6, 

4 < AA x ^N £ 2~ h( ^ Rli ' +R ^ 
/=o 
K 

< 4 i 4 1 ^22 _ -^ [Ki1 ”- r n] 

4(« + 1) 


(6A.26) 


Bn < Rq. 


(6A.27) 
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The bound of Eq. 6A.27 is independent of h. From Eq. 6A.I9 the 
right-hand side of Eq. 6A.27 is therefore also an upper bound on B. If 
R u < R' 0 , the mean computation per message symbol decoded is bounded 
by a number that is independent of the code constraint length K. 

For data rates R N < R' 0 and decoder D r we have seen so far that over 
an ensemble of convolutional codes both B and the over-all mean prob- 
ability of one or more errors, say P[S], are individually bounded, re- 
spectively, by a constant and by an exponentially decreasing function of K. 
(The bound on P[8] follows from Eq. 6A.18 and the inequality P[8] < 
LPjTJ.) It remains to be shown that there are convolutional codes for 
which both bounds are satisfied simultaneously. The argument proving 
that such codes do exist is a simple extension of one with which we are 
already familiar; we know that no more than a fraction l fa of the codes 

in the ensemble can yield 

P[£] > aP[ 8] (6 A. 28a) 

and that no more than a fraction 1 /b of the codes can require a number of 
computations 

5 > bB, (6A.28b) 

where the left-hand sides of Eqs. 6A.28a and b denote quantities averaged 
only over the channel noise. The worst possible situation would obtain 
if the set of codes that produces high error probabilities were disjoint 
from the set that produces high computational requirements. It follows 
immediately that a fraction of at least (1 — 1 ja — 1/6) of the codes in the 
ensemble simultaneously satisfies inequalities converse to Eqs. 6A.28a 
and b. Thus at least 80 per cent of the convolutional codes yield 

P[S] < I0P[8] (6A.29a) 

B < 105. (6A.29b) 

PROBLEMS 

6.1 Consider the parity check coder in Fig. P6.1. 

a. How should the coder be connected in order to produce the following 
transformations? 

*# x 2 x 1 y 5 y 4 Vz Vz V\ 

011 10101 
10 1 0 10 10 

0 1 0 0 0 1 1 0 

b. Give a complete listing of the transformation x — y. 
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*3 *2 XI 



vs y\ V3 V 2 y\ 


Figure P6.1 

6.2 The following code is to be used on a BSC with transition probability p : 

y 0 : 0 0 0 0 0 

y x : 0 1 1 1 0 

y 2 : 1 0 1 0 1 

y 3 : 1 1 0 1 l 

a. Show that the minimum Hamming distance between any two codewords is 
the weight of the non zero codeword with the smallest number of l’s. 

b. Use this number to compute a bound on the probability of error with this 
code. 

c. Generalize the result to any binary code which has the property that the 
modulo-2 sum of every pair of distinct codewords is a non zero codeword. 

6.3 Let x and y be real numbers and define the operation o to mean “addition” 
modulo-(l); that is 

A 

X O y = (.? + y) — [integral part of (x + ?/)]. 

Consider K + 1 vectors each with N components chosen with statistical 
independence from a density function that is uniform over [0, 1]. Let {s,-} be the 
set of all 2 k vectors of the form 

s i = f 0 ° ° ' • ’ ° x K f JO 

where the {x k } are 0 or 1 and “addition” is component by component. 

a. Prove that any two vectors s,- and s fc are statistically independent and 
determine their joint density function. 

b. Specify a transformation on the components of the {s,-} which produces a 
new set {s/} such that 

/W a ’ eX P - 2^2 (1“| 2 + |P| 2 ) 

for all i 7^ j. 

6.4 The device shown in the Fig. P6.4 is used as a coder in the following way. 
A 5-bit block of message bits in inserted in the shift register. The coder is then 
stepped (shifted right) 31 times to produce an output codeword of length 31, 
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(K = 5)-stage binary shift register 



Figure P6.4 


the first 5 digits of which are the message block. For example, if 0 0 0 1 0 is 
inserted, we obtain after each shift 

Shift Contents Output 

0 0 0 1 0 

100001 0 
2 10000 1 

3 01000 0 

4 00100 0 

5 10 0 10 0 

6 0 1 0 0 1 0 

7 10100 1 

and so on. 

a. Show that the coder is linear; that is, if x* and x* are two 5-bit message 
blocks and y* and y* are the respective coder outputs, then 

x = x< © x, => y = y< © y #. 

b. Demonstrate that if the message block contains at least one 1, the coder 
returns to its original state on the 32nd shift, but not before. The shift register 
is said to have a “maximal length” feedback configuration because the maximum 
number of distinct nonzero shift register states is 2* - 1, where K is the length 
of the shift register. It can be shown 36 that a maximal length feedback con- 
figuration exists for all K. 

c. Let z i be the vector obtained from y, by the component transformation 

0 l, 1 -> -1. Define s* = z t to be a signal vector, each component 
of which is Show that 

A' ( nVe^; if y* is all zeros, 

2 s ii ~ { , 

j=i [ — v£ n ; otherwise. 

d. Prove that the set of 32 signals {s*} forms a simplex. 

e. Generalize the proof to maximal length shift register coders for which the 
number, K, of shift register stages is arbitrary. 

6.5 Consider a binary parity-check coder that generates y by performing one 
parity check on each distinct non-null subset of the components of the input 

vector x = (%, » 2 , . . . , 

a. Show that y has N = 2 K - 1 components. 
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b. The coder output s = (sj, s 2 , . . . , s N ) is produced from y by the com- 
ponent transformation 0 +1, 1 -»■ — 1. Show that if 

S t = Oil* ■S»2> • • ■ > s iff) 

= 0*1, S }2-? • • • , s iy) 

are codewords, the vector 

s k = Oil-Til, S i2 Sj2, .... SiftS 

obtained by multiplying corresponding components is also a codeword. 

Hint. Multiplication of the numbers + 1 and — 1 is equivalent to addition 
modulo-2 of the numbers 0 and 1 . 

A 

c. Now let z = (z x , z 2 , . . . , z K ) denote the vector produced from x by the 
component transformation 0 -*• +1, 1 -*■ —1. Show that the arithmetic sum 

-Ti + $2 + " ‘ + s N = [(1 + Zj)(l + * 2 ) • • ■ (1 + z K )] — 1 

(2 K -1; all*/ = = 1,2, 

{—\ ; otherwise. 

d. Use (b) and (c) to prove that the codewords {s £ } form a simplex. 

6.6 a. Calculate the exponential bound parameter R' 0 (in bits per channel use) 
for the two discrete channel models shown in Fig. P6.6. 
b. Devise a way to use the channel of (i) at rate R N = 1 with P[8] =0. 



(i) P + 7 = 1 (ii) Pi + P2 + P3 + q = 1 

Figure P6.6 

6.7 a. Calculate R 0 ' for the A input letter, Q = A + 1 output letter discrete 
memoryless erasure channel described by the transition probabilities 

qu — q&ij‘, i = l, 2, A; j = 1, 2, . . . , A, 

qi,A+i = 1 - 9 ; i = 1, 2, A. 

b. At what maximum average rate R N (in bits per channel use) can we com- 
municate over the channel with P[S] = 0 when a noiseless binary channel is 

available for feedback from receiver to transmitter? 
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c. The capacity of a memoryless channel is not increased by feedback (see 
Wolfowitz 86 , pp. 48-50). Use this fact to construct a simple argument to prove 
that the rate of (b) is the erasure channel’s capacity, C N . 

d. How do iV and C N vary when q is small and A is increased? 

6.8 Determine the value of the unquantized exponential bound parameter R 0 

that results when the coder of (b) of Problem 6.3 is used to construct signals of 
the form n 

•s(0 = 2 Sjipjir) 

;= i 

for transmission over an additive white Gaussian noise channel with power 
density Xj2. 

6.9 Consider a discrete communication channel described by the transition 
probabilities 

P[*il — Ha\ i = 1, 2, A 

y - i, 2, .... q. 

We are interested in the calculation of 


*o' = ’ 

when the channel is very “noisy,” by which we mean that the a posteriori prob- 
ability of the input letter a u given that the output letter /?,• is received, differs 
little (for all i,j) from the a priori probability We write this condition as 

P(4»- 1 b } ] = (1 + e u)pil Ujt 
|e«!«l; all i,j. 

a. Define Q } = 2 P^u as the probability of receiving bj. Show that 

(i) 2pi e a = 2 Qi e a ~ 

i j 

(ii) qu = (i + 

b. By use of the power series 


a a 

Vl+a = l+ -- -- 


]a| < 1, 


show that 


log e (1 + a) = a - — + • ■ • ; l“l < 1» 




c. Now suppose the channel is a BSC derived from an additive white Gaussian 
noise .channel by antipodal signaling and binary quantization of receiver 
matched filter outputs. Show that 

„ lE n 


01 
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when the energy-to-noise ratio per dimension if N /oV' 0 is small. Evaluate R' 0 and 
compare with the value of R 0 obtained by unquantized reception. 

6.10_ Consider a modulator that transmits one of A orthogonal signals, 
{ Ve s <pM , i = 1, 2, . . . , A, over an additive white Gaussian noise channel, 
s n(f) = <*V2. At the receiver the quantities 

r i = ) i = 1, 2, . . . , A 

J— CO 

are calculated and listed in order of decreasing numerical value. 

a. If V e s <p k (t) is transmitted, show that the probability, say Q u that r k is /th 
in the list is 

Qi = ^ ~ J) [QW-H1 - Q(y)] A ~ l ; l = l,2,...,A, 

where the average is with respect to the unit-variance Gaussian random variable 
y with mean y - V 2EJm\. Prove that 

A 

2Qi = l. 

1=1 

b. The receiver is said to use list of L detection 89 if its detector output is an 
ordered list of the subscripts of the L largest r/s. Thus, if 

Gj > t'i 2 > ■ ■ • > r iL > all other r u 

the detector output is the list (i\, / 2 , . . . , if). 

The system from modulator input to detector output can be modeled by a 
discrete memoryless channel with A input letters and Q output letters, with 
Q = A(A ~ 1) • • • (A — L + 1).' [Each distinct list is an output letter.] Verify 

Input 

letters 



Figure P6.10 
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that the transition probabilities converging onto the output letter (1,2,..., 
L) are as shown in Fig. P6.10, with 


a {.A L)l 

f = jA-= Tj!’ 


P — [Qia- I + fi.L+2 + • • • + QJ\ 


(A - L- l)i 
(A ~ 1)! 


Show that a similar fan of probabilities converges onto each output letter. 

c. Determine an expression for in terms of the {Qt}, assuming that all 
input letters are equally likely. (See I. M. Jacobs, “Sequential Decoding with 
Biorthogonal Alphabet and List Decoding,” Jet Propulsion Lab. SPS 37-33, 
Vol. IV, May-June 1965, for detailed analysis of this modulation/detection 
system.) 

6.11 Two binary parity-check codes are said to be equivalent if one can be 
converted into the other by coordinate permutations and/or reassignment of the 
codeword subscripts. 

a. Prove that equivalent codes afford the same P[£] when used on a BSC. 

b. Show that any binary parity-check code {y,-} in which all codewords are 
distinct is equivalent to another such code, say {y-}, in which the first K digits 
of each codeword are the corresponding coder input vector x<. The code {y'.} is 
said to have “canonic form”. 


6.12 Apply the Fano search algorithm of Fig. 6.46 to the t-value plot of Fig. 
6.45. Detail the successive locations of the search node pointer and the values 
of T and 6 in a fashion similar to that of Fig. 6.47. 


6.13 Apply the Fano search algorithm of Fig. 6.46 to a binary tree and a t- 
value plot of your own devising. Be certain that every box in the search al- 
gorithm is entered at least once. Detail the successive locations of the search 
node pointer and the values of T and 6 in a fashion similar to that of Fig. 6.47. 

6.14 Consider a system communicating over a BSC at = 0.9Rq. If the 
transmitter employs a convolutional coder and the receiver a Fano decoder with 
5 /<sec computation time per branch and a 1000-branch memory, estimate the 
largest communication rate in bits per second consistent with 


(i) P[overflow] < 10 8 , 

(ii) _ P[overflow] < 10 -8 . 


Repeat for R N = 0.5 R' 0 . 

6.15 Prove that /(/), as given by Eqs. 6A.2 and 6A.3, reduces for the BSC to a 
constant times the tilted distance function of Eq. 6.109, with p < p' < 1/2. 


6.16 a. Consider using the erasure channel of Problem 6.7 with binary (w = 2) 
convolutional coding and without feedback. What is the probability that k 
successive channel input symbols will be erased? Assuming that this event 
occurs, lower bound the number of branches that a Fano decoder must examine 
before it can determine the correct path beyond the end of the erasures. Express 
your answer as a function of the rate i? N . 
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b. Let Pg[y] denote the probability that y or more branches are searched by 
the decoder before the first encoder input symbol is determined. Use the bound 
of (a) with an appropriate choice of k to establish a lower bound on Pg[y]. At 
what rate f? N does the bound imply an infinite mean number of computations 
(branches searched)? Compare this value with R 0 '. 

6.17 a. Using the results of Appendix 6A, show that the &th moment of the 
number of branches in the incorrect subset for £ h searched by decoder Di is 
bounded (when retracing is discounted) by 

— l k " 

Bh ^ | X 2 2 «) - *A] 

1^=0 n = 1 i=-l 

in which the average is over the ensemble of convolutional codes. 

b. Minkowski’s inequality states that for k > 1 and any set of random 
variables {»,-} 

r7 \k-\Hk 

( 2 N < 2 (N*) 1/fc - 

- \ i / J 3 

Use this inequality to prove that B h k is finite for Rq'IRh > 2k — l. 

c. Generalize Chebyshev’s inequality to prove that for any random variable x 

P[# > y] < x k y~ k . 

d. From (b) and (c) prove that over the ensemble of codes 

PWT>y] < C e y-^ri-e+(fio7B N )], 

where C e is a number independent of y which tends to co as e -*■ 0. 

6.18 The proof of the negative capacity theorem of Chapter 5 can be applied 
to a wide variety of channels. Let us consider the binary symmetric channel. 
The transmitter maps one of M equally likely messages into an N component 
vector, 

Yi = (Va, Va, ■ ■ ■ , Vw)’ 

where the {y i7 ) are each 0 or 1. The channel output is an N component output 
vector r = y 4 © n, where the components of n are statistically independent 
binary random variables with probability p < \ of being 1 and probability 
q = 1 - p of being 0. Verify the following sequence of steps: 

a. Let {/,-} denote the decision regions: r in IiOm = m t -. As usual, the {/*•} 
are disjoint. The “volume” of I £ , denoted V it is defined as the number of binary 
vectors in /*. Thus 

M-l A 

2 Vi = 2 N = V. 
j M-l 

P[e] =T 7 2 P[rin/< j m t \. 

M o 



Obviously, 
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Verify that 


P[r in I m t ] < P[r in S { | mj. 


in which S t is a “sphere” around y it of “radius” pi, with the same “volume” as 
li. Specifically, contains all vectors whose Hamming distance from y t is 
pi - 1 or less plus enough vectors at Hamming distance p { to round out the 
total to Thus p t is defined by 


b. Verify that 


I 

3 = 0 


{N\ 

£ ( N \ 

• < 


\J 

\j ! 


> Vi. 


1 Af- 1 

P[C] < — 2 P[r in St | m<] 

Mj= o 


= P[|n| < pi 

where |nj 2 is the number of l’s in n, and S t is a sphere of volume VjM and radius 
p centered on y f . 

c. Verify that P[C] < « for large enough N whenever p < (p ~ A )N. 

d. Prove that A can be taken small enough so that p <{p — A )N for large 
enough N whenever R H > C N , where 

M = 2**n, C n = 1 — H(p). 

Hint. From Stirling’s approximation to the factorial, for large N 


p UN 
i-o\i 


N 

P-1 


> _1 — D/-VJ 

N 


in which H is the binary entropy function, defined as 
H(x ) = -a: log 2 « - (I - x) log 2 (1 - *); 
Show first that p satisfies the inequality 

1 


H 


P ~ 1 

N 


< 1 - R N - - log 2 N, 


which, if 1 ~ i? N < H(p), implies (p - 1 )/N < p - A for sufficiently large N 
and small A. 

e. Evaluate C N when the BSC is derived from antipodal signaling over an 
additive white Gaussian noise channel with a quantized receiver. Assume that 
E n /X 0 « 1 and compare with R 0 '. 


7 

Important Channel Models 


In this chapter we extend the results of Chapters 4, 5, and 6 to channel 
models that more closely approximate certain aspects of actual com- 
munication systems. In particular, we consider additive, Gaussian noise 
channels in which the noise is not white and in which the received signal 
component may have an unknown attenuation or phase or both. The 
tools used in the study of these channels — the theorem on reversibility, 
the whitening filter, the representation of narrow-band noise, and the 
elimination of random parameters by integration — are exceedingly power- 
ful and may be applied to still more general channel models involving time- 
variant dispersive propagation. 49 

7.1 EFFECTS OF FILTERING 

In Chapter 4 we assumed that the signal component of the received 
waveform was unaffected by transmission except for the addition of white 
Gaussian noise. Actually, noise is never exactly white, and in some cases 
a white-noise approximation may be seriously inappropriate. Also, most 
transmission media alter the waveshapes of the transmitted signals in one 
way or another. 

It is frequently possible to attribute such phenomena to the action of 
linear filters operating on white noise to “color” it or on the signal to 
modify it. We deal first with channels containing a filter whose transfer 
function is known. 

Filtered-Signal Channels 

The channel pictured in Fig. 7.1 is a simple extension of the additive 
white Gaussian noise channel of Chapter 4 and requires no new tools for 
its analysis. The linear filter! is time-invariant, hence is characterized by 

t The restrictions of linearity and time-invariance are unnecessary except insofar as 
they facilitate description of the operation relating s{t) and s a (t). The important 
assumption is that the receiver in Fig. 7. 1 must be able to calculate s°(t) from knowledge 
of s(t) and the transmission characteristic of the channel. 
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its impulse response h(t), which is assumed to be known at the receiver: 
As a trivial example, we might have 

A(0“«<5(f-r) (7.1a) 

so that 

s\t) = a s(t - t). (7.1b) 

Here a is a known attenuation and r is a known delay. 



Figure 7.1 The filtered-signal channel with additive white Gaussian noise. Because 

s°{t) = ( Si(a)h(t — a) da; i = 0, 1, . . . , M — 1, 

J -00 

the filter input and output processes are related by s°(t) — s(t) * /»(/). 


The derivation of the optimum receiver for the additive white Gaussian 
channel with filtering is identical to the derivation in Chapter 4 for the 
channel without filtering. Since the essential condition that the {/wj 
uniquely specify the {j, o ( 0} is satisfied, we need only add a superscript 
to each signal waveform and corresponding vector. Thus the optimum 
receiver selects m = w* if and only if 


PK1 p t ( P | s° = s ; °) 

(7.2a) 

is maximum or (equivalently) if and only if 


f *K0 V(0 dt + cf 

(7.2b) 

v CO 

is maximum, where 


Ct ”=Tsin P[m..]-1 f “{smfdt. 

2. L 2 — co 

(7.2c) 


The probability of error depends on the location of the vectors {s°} 
rather than directly on the vectors (sj. In general, the {s £ °} must be 
determined from the filtered waveforms by means of the Gram- 

Schmidt procedure of Appendix 4A. It is not usually possible to avoid 
this step and obtain the {s f °} directly from the (sj because the same set 
of N orthonormal functions { 9 ?//)} cannot in general be used to represent 
both the {j 4 (0} and the 

One form of optimum receiver is the correlation receiver shown in 
Fig. 7.2. Since the reference waveforms {s^t)} are filtered before being 
correlated with r(t), this implementation is often called a “filtered-refer- 
ence” receiver. 




Figure 7.2 Filtered-reference correlation receiver. 


The matched-filter version of the optimum receiver is shown in Fig. 7.3. 
Since the spectrum of s°(t) is 

W)-^/)^/), (7.3a) 

we know from Eq. 4.67 that the transfer function of a filter matched to 
s t °(t) with delay T is 

Gi(f) = [s t °(f)]*r» rfT 

= (7 .3b) 

Thus each matched filter, i = 0, 1, . . . , M ~ 1, may be realized as the 
cascade of a filter matched to h(t) and a filter S*(f)e- ]2 " fT * 

matched to s^t), with + T 2 = T. Since the first filter is common to all i. 


co° 



Sample at t = T 1 + 22 


Figure 7.3 Filtered-signal receiver with M signal-matched filters. 
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it is economical to place it directly after the channel, as shown in Fig. 7.3. 
The resulting structure is called a “filtered-signal” receiver. If the (un- 
filtered) signals are expressed as 

»)=2*<>ft «); I' = o.i M-i, (7.4) 

3=1 

the final receiver stages may be implemented with only N filters, matched 
to the as discussed in Chapter 4. 

It is interesting to note that the final stages of the filtered-signal receiver 
are identical in form to the optimum receiver for white Gaussian noise, 
.even though at the output of the channel-matching filter the noise is not 
white and the transmitted signal is distorted. Of course it does not follow 
that the error probability is the same as that obtained with white Gaussian 
noise and without channel filtering. 

If the relationship between the transmitted signal .?(/) and the received 
signal s°(t ) is statistical rather than causal (for instance if a in Eq. 7.1b is 
a random parameter), the unique correspondence between the (wj and 
the {*7(0} will be broken, and the foregoing analysis will be invalid. Such 
situations are considered in Section 7.3. As a practical matter, whether 
we should treat a parameter as random or known depends on whether the 
vector signal constellation and the decision regions implied thereby are 
seriously affected when the value of the parameter varies over its probable 
range (see the discussion of component accuracy, Section 4.4). 

Theorem on Reversibility 

Analysis is somewhat more complicated when a linear filter colors 
white Gaussian noise before it is added to the received signal. In this and 
certain other cases to be considered we may construct the receiver in two 
steps, as shown in Fig. 7.4. First, an operation is performed on the 



Figure 7.4 Insertion of an operation between channel and receiver. 


channel output r{t) to yield a new output, r°(t); second, an optimum 
receiver is designed with r°(t) as input. Clearly, such a two-step procedure 
cannot result in a lower probability of error than the one-step process of 
designing an optimum receiver directly for the input r(t)\ indeed, an 
increase in the probability of error may result. But if an inverse operation 
exists which permits r(t) to be reconstructed from r°(t) (as discussed in 
Section 4.2), the theorem on reversibility states that the probability of error 
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need not be increased by the two-step procedure. [The second step may 
consist of the calculation of r(t) from r°(t ) followed by the optimum 
receiver for /■(/).] In its most general form the theorem on reversibility 
states : 

A reversible operation , transforming r(t) into one or more waveforms, 
may be inserted between the channel output and the receiver without affecting 
the minimum attainable probability of error. 


Additive Nonwhite Gaussian Noise 

The theorem on reversibility may be applied immediately to the design 
of an optimum receiver for a channel with additive nonwhite Gaussian 
noise. Suppose that the received signal is 

r{t) = s(t) 4- n{t), (7.5) 

where n{t) is zero-mean Gaussian noise whose power spectrum §„(/) is 
not constant for all frequencies. Under very weak conditions,! which we 
may always assume to be satisfied in a physical system, there exists a 
linear filter, with impulse response g(t) and transfer function G(f), which 
has a realizable inverse and for which 

|G(/)| 2 = — 0 — — . (7.6a) 

17 2 S «(/) 

If n(t) is the input to this filter, the output (by Eq. 3.114) is a Gaussian 
process with power spectrum 

|C(/)| 2 S„(/) = JVo/2; (7.6b) 


that is, the output is white noise. The filter G(J ) is called a whitening 
filter. 

As an example, consider 


S „(/) - 


f + 4 
/ 2 + 1 ' 


(7.7a) 


! The condition is thai the Paley-Wiener 38 criterion 


<CO |ln §»(/>! 
„ -co 1 +/ 2 


elf < CO 


be satisfied. If it is not, the noise is singular in the sense that its future values may be 
predicted exactly from knowledge of any interval of its past. We have already observed 
one such example in Appendix 5B. For a complete discussion, see W. L. Root, “Sing- 
ular Gaussian measures in Detection Theory,” Chapter 20 of Time Series Analysis 
(M. Rosenblatt, Ed.), John Wiley & Sons, New York, 1963. 






f 2 + 1 
f 2 + 4 

( 6 ) 

Figure 7.5 Example of power spectrum of noise and whitening filter. 

As illustrated in Fig. 7.5, we may set 

G(/) = , (7.7b) 

)f +2 

which yields at its output the power spectrum §,„(/) = JV 0 /2 =1. A 
possible implementation is shown in Fig. 7.6a, in which n(t) and n w {t) are 
assumed to be voltages. Clearly, the filter G(f ) has a realizable inverse, 
to wit, the filter with system function 

G~\f) = . (7.7c) 

j/+ 1 

A realization for the filter G~\f) is shown in Fig. 1.6b. 

The filter G(f) in Eq. 7.7 was determined by a general method that is 




Figure 7.6 Realization of whitening filter and inverse for S n (f ) = (/ 2 + 4 )/(/- + 3): 
(«) (?(/); (b) G~Kf). 




Figure 7.7 Optimum receivers for nonwhite Gaussian noise channel, r(t) «=> s (/) + n (t): 
(a) filtered-reference receiver; ( b ) filtered-signal receiver. 


applicable whenever §„(/) can be written as a ratio of two polynomials 
in /. The method is discussed in Appendix 7A. 

Once the (reversible) whitening filter is known, the optimum receiver is 
easily determined. Place the whitening filter at the channel output. Since 
the filter input is r(t) = s(t) + n(t), the filter output is 

r°(t) = s°(t) + n t0 (t), (7.8) 

where s°(t ) is the response of the whitening filter to the input 5(f)- The 
combination of channel and G(f ) in cascade appears as an additive white 
Gaussian noise, filtered-signal channel. The filtered-reference receiver 
for r°(t) is that given in Fig. 7.2, hence the entire receiver for an additive 
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channel with nonwhite Gaussian noise is that shown in Fig. 1.1a. This 
receiver is optimum because the filtering operation G(f) is reversible. 

It is also instructive to consider the filtered-signal version of the optimum 
receiver. The insertion of the whitening filter G(f) calls for a channel- 
matching filter G*(f)e~ r2 ‘ nfTi , so that the cascade of the two together 
yields the over-all system function 

,\p 12 

G(f ) G*(f)e~' 2n/Tl = e~' 2wfTl . (7.9) 

§ n(J ) 

The cascade may be implemented as a single filter, as shown in Fig. 1.1b; 
we choose T x large enough so that the filter (or a satisfactory approxi- 
mation of it) is realizable.! The effect of the filter is to pass energy over 
frequency bands where the noise power is weak and to suppress energy 
over frequency bands where it is strong. 


7.2 BANDPASS CHANNELS 

We now consider an important special case of the filtered-signal channel 
of Fig. 7.1, the bandpass channel illustrated in Fig. 7.8. The filter W 0 (f) 
is an ideal bandiimited filter, with bandwidth 2W and center frequency 
fo > defined by the system function 



fo~W< |/| <f 0 + W 
elsewhere. 


(7T0) 



(a) (b) 

Figure 7.8 Ideal bandpass additive white Gaussian noise channel. 

Since this channel is a special case of the general filtered channel, the 
preceding derivation of the optimum receiver and the implementation 
diagrammed in Fig. 7.2 remain valid. It is common engineering practice, 
however, to produce the signal set {s/(0} directly at the transmitter by 
first generating low-frequency ( baseband ) signals {-s/0} and then trans- 
lating them in frequency ( heterodyning them) up into the passband of 

f Determination of the best receiver is more involved when a constraint is imposed 
on the maximum allowable value of the delay 7j. 21 
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JT 0 (/). At the receiver the signals are heterodyned back down from pass- 
band to baseband, and the decision about the transmitted message is 
based on the resulting low-frequency waveforms. The technique is useful 
even when the channel noise is not white. 

DSB-SC Modulation 

A possible system of this sort is shown in Fig. 7.9: the frequency 
translation at the transmitter is called double-sideband suppressed carrier 


-J2 COS Wot -v/2 COS a>0 1 



DSB-SC Modulator DSB-SC Demodulator 



Figure 7.9 System utilizing DSB-SC modulation-demodulation; to# 4 2 rrf u . 

(DSB-SC) amplitude modulation and that at the receiver is called synchro- 
nous demodulation. We shall consider bandpass channels identical to that 
of Fig. 7.8, except generalized to situations in which the power spectrum 
of the noise is not white. The ideal lowpass modulator filter 


A fi; \f\<w 

W{f) = a , . 

.0; elsewhere 


has been introduced into Fig. 7.9 as an explicit reminder [for all s/0] 
that S'//) = 0 for |/| > IT, where S'//) denotes the Fourier transform 
of s/0- 

The reason for the nomenclature “double-sideband suppressed carrier” 
is clarified in Fig. 7. 10. Since multiplying two functions in the time domain 
corresponds to convolving their spectra in the frequency domain, the 
Fourier spectrum of any transmitted signal s°(t) contains two sidebands 
located symmetrically about ± f 0 and therefore occupies twice the band- 
width occupied by s/0- The sinusoid \/2cos co^t is called the carrier ; 
the fact that s °{t) does not contain any discrete Fourier component at 
/ 0 accounts for the suppressed-carrier terminology. 
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S/(0) 

/ 

'is* 0)1 

X— 

-W 



(a) 



Figure 7.10 Spectra of baseband waveform Si(t) and DSB-SC modulated waveform 

The first thing to notice about the DSB-SC system of Fig. 7.9 is that 
in the absence of noise the lowpass signals {^(0} are reproduced at the 
output of the DSB-SC demodulator without any alteration. (We assume 
throughout this section that the receiver knows the exact carrier phase. 
The random-phase case is considered in Section 7.3.) Indeed, remultiply- 
ing by yjl cos w 0 t at the receiver and lowpass filtering with W(f) exactly 
undoes the modulation performed at the transmitter: using the subscript 
“Ip” to mean “the low-frequency components of,” we have 

{fo(0\/2 cos co 0 r] [y/2 cos 

= 2j<(/)[$ 4- i cos = *<(*)• (7-12) 
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The normalizing factors yj 2 maintain the energy of s°(t) equal to that of 
by inspection of Fig. 7.10 and use of Parseval’s theorem we have 

pro pro pro pro 

I [s<0)F dt = \ IS i (/)| 2 d/= iSi°(/)l 2 df = [Si°(t)f dt. (7.13) 

J — DO J — CO J —CO J — CO 

The reason for the restriction f 0 > W in Eq. 7.10 is illustrated in 
Fig. 7.1 la: if f 0 < W, the spectral terms S { (f + f 0 ) and SJf — /o) overlap 



Figure 7.11 Distortion in DSB-SC operation due to aliasing. 


around / = 0, and j f (/) is not regained exactly from s°(t) by DSB-SC 
demodulation; the resulting distortion is called “aliasing.” In engineering 
practice we deal with actual rather than ideal lowpass signals, which are, 
of course, unrealizable. Aliasing becomes a problem when the spectrum 
skirts and frequency f 0 are so related that the spectral overlap is significant, 
as indicated in Fig. 7.116. 

It remains to determine the conditions under which the modulation and 
demodulation operations, in addition to being convenient from an 
engineering point of view, do not degrade the error performance that can 
be achieved when stationary Gaussian noise n(t ) (which may or may not be 
white) is present. The derivation is somewhat long, but it can be broken 
down into a series of relatively straightforward steps, each of which pro- 
vides important insight into the problem. 
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«... 0; fo -W<\f\<fo+W 

i; elsewhere 


Figure 7.12 Bandpass channel output filtering. It is clear that r(t) — r^t) 4- r 2 (rl- 

l Elimination of out-of-band noise. The first step is to observe that a 
filter W 0 (f) can be applied at the channel output, as shown in Fig. 7.12, 
without loss of optimality. Proof follows from the fact that the noise 
rft) outside the passband of W 0 (f) is irrelevant and may therefore be 
discarded. To see this, we observe that no signal energy lies outside the 
passband and make the usual assumption that the noise process n{t) and 
the signal process s\t) are statistically independent. Next, we recall from 
Eq. 3.130 that passing a Gaussian noise process through two linear filters 
with disjoint passbands results in two jointly Gaussian processes that are 
statistically independent. Thus r 2 (0 contains no signal and is statistically 
independent of the additive noise in rft), which proves irrelevance. 

Bandpass filtering with lowpass filters. The second step in proving 
the optimality of the DSB-SC modulation system is to observe that the 
bandpass filter W 0 (f) can be implemented by means of the parallel sine- 
cosine demodulator-modulator cascades shown in Fig. 7.13. Let h T {t) 
denote the response of this demodulator-modulator complex to an 


sjl cos ojq t V2" cos wo t 



Demodulators Modulators 


Figure 7.13 Demodulator-modulator cascade with parallel sine and cosine channels. 
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impulse <5(r — r) applied at time r and let >v(?) denote the impulse response 
of the lowpass filter W(f). We observe directly from Fig. 7.13 that 

r oo 

h r (t ) = -y/2 cos a> 0 t d(a — tX/ 2 cos a > 0 a w(t — a) da. 

J — GO 
Cgo 

-J2 sin O) 0 t d(P — t)^J 2 sin \v{t — (5) d{} 

J— co 

= 2 w(t — r)[cos co 0 t cos co 0 r + sin co 0 t sin n> 0 r] 

= 2 w(t — r) cos a> 0 (t — t). (7.14) 

On the other hand, since W 0 (f ) is obtained when W(f) is convolved with 
[<5(/ — / 0 ) + 6(f + / 0 )], the impulse response of fV 0 (f) is the product of 
vu(t) and the inverse Fourier transform of [d(f— f 0 ) + d(f -j- / 0 )j, namely 
2 cos a> 0 L Hence the response of W 0 (f) to the delayed impulse d(t — r) 
is 2 w(t — t) cos ro 0 (/ — r), which is identical to the response of the 
demodulator-modulator complex given by Eq. 7.14. We conclude that 
the demodulator-modulator complex is interchangable with the time- 
invariant filter 1F 0 (/)— both produce the same output when driven by a 
common input. 

By combining the results of the first two steps of our analysis, we see 
that the cascaded receiver arrangement of Fig. 7.14a entails no loss of 
optimality: the demodulator-modulator complex is equivalent to the 
bandpass filter W 0 (f) of Fig. 7.12 and acts only to discard irrelevant noise 
outside the signal band. 

Since the output of the modulator and summing stages in Fig. 7.14a 
is sufficient for making an optimum determination of m, so also must be 
the two modulator inputs. Thus we may drop the modulator and summing 
stages and build an optimum receiver that operates directly on the lowpass 
signals rft) and as shown in Fig. 7.142;. 

Demodulated noise . Figure 7.146, except for the sine demodulator, 
is identical in form to the corresponding stages of Fig. 7.9. Moreover, 
with reference to these figures we note that 

[(S/0V2 cos c<y)(\/ 2 sin <u 0 03j j, = [sft) sin 2 w 0 t\ lv = 0 (7.1 5) 

for all 1 , so that no signal term is present in rjf). When m = m { , we 
therefore have 

r e (t) = sft) + nftf (7.16a) 

r s (0 4 nft). (7.16b) 

The third step in determining the optimality of our DSB-SC receiver is 
to ascertain under what conditions the noise nft) is irrelevant and may be 
discarded. Since nft ) and nft) result from linear (albeit time-variant) 
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operations performed on the stationary Gaussian process n(t), they are 
jointly Gaussian. We assume that n(t), hence n c (t) and n s (t), is zero-mean. 
It follows that n£t ) is statistically independent of n e (t) and may be dis- 
carded if and only if the crosscorrelation function 

3l cs (h, t 2 ) = n B (ti) n s (t 2 ) (7.17a) 

is zero for all observation instants ^ and t 2 . 




Figure 7.14 Optimum receiver for sine and cosine DSB-SC demodulation; 

s°(t) + «(*)■ 


KO — 


In this section we determine the crosscorrelation function 5l cs , as well 
as the autocorrelation functions 

% c (h, h) = n c (h)n c (t 2 ) (7.17b) 

t 2 ) 4 n s (i x ) n s {: 2 ). (7.17c) 

From Fig. 7.146 

n£t) = y/2 n(a) cos <w 0 a w(t — a) da (7.18a) 

•) — go 

n s (t) = -J2 «(a) sin co 0 a w(* — a) da, (7.18b) 
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A parallel derivation leads to the result 

yUJAh) - 2 f“ s n (/) W(f + fo) w*(f-f 0 ) 

J — GO 

x exp {J27 t[/ 1 (/ + /„) - tff - /<,)]} df 


= 0. 

(7.21b) 

Here the identity stems from the fact that 


W(f+fo) W*(f~fo) = 0; for all / 

whenever / 0 > W. 

We now combine Eqs. 7.21a and b. From Eqs. 7.19 


y(tj) y*(t 2 ) = [nM + j« s (fi)][« c (/ 2 ) - j n s (t 2 )] 
and 

(7.22a) 

y(h) y(t 2 ) = [« e (*i) + j« s (*i)][«c(' 2 ) + j n t {t 2 )\. 

(7.22b) 

Equating the real and imaginary parts in both Eqs. 7.21 and 7.22 and 
invoking the definitions of Eqs. 7.17, we have 

= Re r §„(/ df 

J-W 

(7.23a) | 

■ 3l sc (r) = -Jijr) - Im [" S n (f - f 0 )e'^ r df. 

J-W 

(7.23b) 

Finally, if we write S n (f — / 0 ), — W </ < W, in terms of its even and odd 
parts, say S „(/) and S 0 (f), respectively, we obtain the important results 

§«(/-/«) = §«(/) + §„(/); ~w</< w, 

(7.24a) | 

rw 

A-cOO = ft s (r) = § e (f) cos 2-nfr df 

J-W 

(7.24b) 

rw 

*Jr) = - % BS (r) = S.O) sin 2 tt/t df 

J-w 

(7.24 c) 


where, as before, t = t 1 — t 2 . The mechanics of Eqs. 7.24 are illustrated 
in Fig. 7.15. 

In interpreting the significance of Eqs. 7.24, it is important to note that 
the derivation does not require that «(/) be a Gaussian process but only 
that it be wide-sense stationary. From this attribute alone it follows that 
the two demodulated processes nft ) and « s (r)are also wide-sense stationary 
and that each has the same power spectrum, S e {f). Remarkably, this is 
true even though these processes are obtained from n(t) by means of a 
time-variant operation. We note that samples taken from the two lowpass 
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processes at the same instant (implying r = 0) are always uncorrelated; 
» e (0 »*(0 = 0- The samples are uncorrelated for all observation instants 
t 1 and t 2 if and only if S n (f — f 0 ) is even over the band [— W, W], 




s e (f)= i(s» (f-f 0 )+ s n (-f~fo)\, -W<f<w 

So(/>=fls„ (f-fo)- s„ (~f-f 0 )\; ~w<f<w 
Figure 7.15 Determination of S „(/) and S a (/) from S„(/). 


Optimum receivers . When the stationary noise process n(t) is Gaussian, 
nft) and n s {t) are statistically independent as long as they are uncor- 
related for all pairs of observation instants. Thus nft) is irrelevant when- 
ever S n (/-/ 0 ) is even over [- W, W]. It is under this condition that the 
sine demodulator in Fig. 7.146 can be discarded without increasing the 
minimum attainable probability of error. In particular, the condition is 
met when n{t) is white Gaussian noise. In this case, to which we now 
direct our attention, 

Uf - fo) = S .(-/ - fo) = • (7.25a) 

Thus 

f" ; for i/| < W 

§,(/)= §J/)= 2 (7.25b) 

v0; elsewhere 

and 

3i cs (r) = 0. (7.25c) 

The fourth and final step in deriving the optimum DSB-SC receiver for 
white Gaussian noise is to make an optimum decision in regard to the 
transmitted signal on the basis of the remaining baseband signal 

r c (t) » s(t) + nft). 


elsewhere 


(7.25a) 


(7.25b) 


(7.26) 
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This problem we have already solved in Chapter 4. If the signals 
were to be transmitted directly over a channel corrupted by additive white 
Gaussian noise, the optimum receiver would crosscorrelate the received 
signal against each of the {s^t)}. The only difference implied by Eqs. 7.25b 
and 7.26 is that the channel is effectively cascaded with a lowpass filter 
W(f). Clearly, such a filter does not affect the optimality of the Chapter 4 
receiver, since the transmitted signal is not changed thereby and the noise 
outside this filter band is irrelevant. It follows that the DSB-SC receiver 



Figure 7.16 Complete DSB-SC system for white Gaussian noise. 


shown in Fig. 7.16 is optimum and that the over-all system performance is 
identical to that which would be obtained if the lowpass signals {s$)} 
were used, without modulation, over a white Gaussian noise channel with 
the same noise power density X 0 /2. The fact that the lowpass filter W{f) 
may be omitted from the receiver demodulator should be noted: since 
correlation of r 0 (t) with the {$*(0} can also be performed by matched 
filtering and each matched filter is itself a lowpass filter, W(f) is redundant. 

If § n (f — / 0 ) is even but not constant over [— W, W], a realization 
of the optimum lowpass receiver may be constructed by passing r c (t) 
through an invertible filter that effectively “whitens” n c {t) over the 
baseband. 

Bandpass signal decomposition. The equivalence of the demodulator- 
modulator complex of Fig. 7.13 to the bandpass filter W 0 {f) has other 
interesting implications, the most immediately obvious being that any 
Fourier-transformable bandpass signal of bandwidth 2 W may be written 
in terms of two lowpass signals, each of bandwidth W, as 

j(/) = s c (t ) V 2 cos 2-nf 0 t + s s (t) V 2 sin 2vf 0 t. (7.27) 
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The notation is that of Fig. 7.17a, with x(t) set equal to s(t). Thus the use 
of two DSB-SC modulators, one with a cosine and one with a sine carrier, 
permits the generation of completely general bandpass signals. This same 
observation applies equally to a (wide-sense) stationary bandpass noise 
process, n(t), the sample functions of which have infinite energy and there- 
fore do not possess a Fourier transform. Again from Fig. 7.17a, with 


~>J2 cos wo t V2 cos wo t 



Jl sin wo* V2" sin wo t 

Figure 7.17 a Decomposition of a bandpass waveform x(t) into two lowpass waveforms. 
x(t ) = n(t), we have directly 

n{t) = n c (t) y/2 cos co 0 t + n s (/)V 2 sin co 0 /. (7.28) 

Any bandpass noise process, n(t), of bandwidth 2 W may therefore be 
decomposed into two lowpass noise processes, n e (t) and n g (t), each of 
bandwidth W.f 

Equation 7.28 can be interpreted graphically as shown in Fig. 7.176. 
We think of cos c o 0 t . as being a rapidly rotating phasor, with an amplitude 
V 2 n c (t) that varies slowly with respect to the rate of rotation. The term 
\/2» s (0 sin (ntf is another such rotating phasor, shifted 90° in phase in 
relation to the first. The waveform n(t) is the projection of the vector sum 
of these two phasors on the horizontal axis. 

For the Gaussian bandpass channel of Fig. 7.16 the two lowpass noise 
processes n c (t) and n s (t ) are statistically independent. If at the transmitter 
we use both sine and cosine modulators, in accordance with Eq. 7.27 we 
can simultaneously transmit two waveforms through the bandpass filter 
W 0 (/), each of which is selected under the control of a different, statisti- 
cally independent transmitter input. In this case the optimum receiver can 

t The normalizing factor V2 is often dropped. This causes & c (t), A 3 (t), ftcsfr), and 
Jt sc (r) to have twice the values given by Eqs. 7.24. 
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Figure 7.176 Phasor bandpass noise representation. 


be realized by using both sine and cosine demodulators, each followed by 
a baseband receiver that makes optimum decisions independent of the 
other. Since sine and cosine carriers are represented by perpendicular 
rotating vectors in a phasor diagram, this is called quadrature multiplexing. 

Single-Sideband Modulation 

A second way of performing frequency translation from baseband to 
passband, called single-sideband modulation (SSB), is illustrated in Fig. 7.18. 
A lowpass waveform sff) is first multiplied by 2 cos a> x t, which yields the 
familiar double-sided frequency spectrum, symmetric about ±/i- Because 
of this symmetry it is clear that both sidebands are not required to recon- 
struct the original sft). Accordingly, in an SSB system one of these side- 
bands is eliminated before transmission, perhaps by means of a sharp 
cutoff filter which we idealize for analytical purposes by the filter Wff) 
shown in the figure. The transmitted signal s°(t) then occupies only a 
bandwidth W rather than 2 W as in DSB-SC. 


Wi(f) 


2 cos mi t 




-fi-W -fx 


L/- 

h fi + W 


(a) W 

Figure 7.18 Single-sideband modulation; % 4 2*f t . The baseband signal spectrum 
5 ,.(/> is presumed to be identically zero for | / | > W. 
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W(f) 




1 



H 



-W : 

I w 


(a) 

Figure 7.19 


(b) 

Single-sideband demodulator. 


An SSB demodulator is shown in Fig. 7.19, and the spectral trans- 
formations corresponding to SSB modulation followed by SSB demodu- 
lation are illustrated in Fig. 7.20. It is clear that in the absence of noise 
any Fourier-transformable lowpass signal sft) passing through the SSB 
modulator-demodulator cascade is reproduced exactly at the output. 
Conversely, any Fourier-transformable bandpass waveform s°(t) applied 
first to the SSB demodulator and then to the SSB modulator, as shown in 
Fig. 7.21, is also reproduced exactly. 


\Si(f)\ |s/w| 



Figure 7.20 Spectra for single-sideband modulation, (a) unmodulated; (6) after 
modulation; (c) after multiplication by 2 cos <o x t but before filtering by W(f). 



' / ' ' 

Demodulator Modulator 

Figure 7.21 SSB demodulator-modulator cascade. 


The normalizing factor 2 multiplying cos (o x t in Fig. 7.18 has been 
chosen so that the energy in any transmitted signal s { °(t) is again held 
equal to the energy in sft). That this equality is preserved is clear from 
Fig. 7.20. 

Just as in DSB-SC the use of an SSB modulator at the transmitter and 
an SSB demodulator at the receiver entails no loss of optimality when 






SSB Modulator SSB Demodulator 

Figure 7.22 Singie-sideband system. We assume the channel adds white Gaussian 
noise but propagates s°(t) without other disturbance. 


additive white Gaussian noise is present. Proof is exactly parallel to that 
for DSB-SC. The notation used in the arguments that follow is defined 
in the system diagram of Fig. 7.22. 

The first step is to note that s°(t) passes through W x {f) undistorted and 
that the noise in r(t) outside the passband of W t {f) is independent of the 
noise within the band, hence is irrelevant. Thus the insertion of the 
receiving filter W x {f) in Fig. 7.22 entails no loss of optimality. 

The second step is to note that the cascade of an SSB demodulator and 
an SSB modulator (already considered in Fig. 7.21) is equivalent to the 
filter W x (f) alone. This follows directly from the fact that the response, 
Wl (t ~ r), of W x (f) to an impulse applied at any time r is a band-limited 
waveform that passes through the demodulator-modulator cascade of 
Fig. 7.21 unchanged. Thus the entire SSB demodulator is a reversible 
operation insofar as the relevant received signal r t (0 in Fig. 7.22 is con- 
cerned. Accordingly, we can again operate directly on the baseband 
signal r 2 (t) without loss of optimality. 


s ni (f) . s ni (f-fi);-W<f<W 



Figure 7.23 Determination of the even and odd parts of S„ x (/ — _/i) over [-JV, W], 
See also Figure 7.15. 
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[An immediate implication is that we now have a second representation 
for a bandpass noise process. The process «/?), with bandwidth W, com- 
pletely determines and is determined by a single lowpass process n 2 (t), 
also with bandwidth W. In contradistinction, for DSB-SC we represented 
a bandpass noise process with bandwidth 2 IF by means of two lowpass 
processes each with bandwidth W .] 

The third step in proving no loss of optimality with SSB requires 
knowledge of the power spectrum S n J < f) of n 2 (t), which can be determined 
immediately by application of Eq. 1.25b (see Fig. 7.23). If the normalizing 
factor multiplying cos co^ in the SSB demodulator were \jl, we would 
have an output noise-power spectrum of J\P 0 /4 over the band [— W, W\ 
Since the actual factor is 2, however, we multiply Jf 0 /4 by (2/V2) 2 and 
obtain 



I/I < w 

elsewhere. 


(7.29) 


The final step concerns the optimum reception of lowpass signals in 
the additive Gaussian noise n 2 (t). This noise is flat, with power density 
Jf 0 /2, over the band occupied by the signals. Thus the receiver of Fig. 7.22 
is indeed optimum, and the use of SSB has in no way affected the attain- 
able accuracy of communication. It is important to note that in the SSB 
receiver the preliminary filter W x (f) is essential, whereas prefiltering is 
not essential with DSB-SC. The reason is that W^f) prevents noise in 
the frequency interval [/-, — from contributing to the demodulator 
output r 2 (t). 


Comparison of DSB-SC and SSB 

The foregoing analyses have shown that there is no difference for an 
idealized mathematical model between the best obtainable performance 
with DSB-SC and SSB, but many engineering subtleties enter into a 
preference for one or the other.f For example, for any given set {^(/)} of 
lowpass modulating waveforms with bandwidth W, SSB requires one half 
of the bandwidth of DSB-SC, which permits frequency multiplexing and 
alleviates the frequency assignment problem when the spectrum is crowded. 
Use of quadrature multiplexing with DSB-SC can balance the accounts; 
however, departures of a bandpass channel from our idealized model may 
introduce more cross talk (co-channel interference) with quadrature multi- 
plexing than with frequency multiplexing. In particular, cross talk is 

f See the Single Sideband Issue, Proc. IRE, 44, No. 12, December 1956. 
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introduced with quadrature multiplexing by channels distorted by a time- 
invariant linear filter H(f) whenever H(f 0 4- A f) ^ H*(f 0 — A/) for all 
| A/| < W, whereas cross talk with ideal frequency multiplexing is intro- 
duced by a linear filter only if it is time-variant. An advantage of DSB-SC 
is that high power SSB signals are more difficult to generate. 


7.3 RANDOM AMPLITUDE AND PHASE 

Thus far we have assumed that the signal has been subjected only to 
transformations whose characteristics are precisely known by the receiver. 
In particular, we assumed in the bandpass signal case that both the gain 
and phase of the signal component of the received waveform were known 
exactly. In practice, this knowledge is not always available. We now 
consider certain simple mathematical models of filtered-signal channels in 
which the signal filtering depends on one or more unknown (random) 
parameters. 

Random Amplitude 

An elementary random-parameter channel is the pure fading model in 
which the received signal is 

r{t) = as(t) + njt), (7.30) 

where a is a random variable with known probability density function p a . 
We assume that a is statistically independent both of the transmitted 
signal s(t ) and the additive white Gaussian noise njt). 

In cases such as that of Eq. 7.30 the derivation of the optimum receiver 
involves only a simple extension of preceding results. We again set 
m = m { if and only if i maximizes the quantity 


r = Pi = 


PK3 Pt(P i m i) 


Here the correspondence between waveforms and vectors is the same as in 
Chapter 4. For simplicity, we assume throughout the rest of this chapter 
that the (mj are equally likely. It follows that the optimum receiver need 
only determine the i that maximizes p y (p | »h). 

A random parameter such as a enters into the optimum receiver 
formulation by virtue of the fact that 


Pr(P I m i) = 


0O 

pXp | a — «) P«( a ) da. 


— CO 


(7.32) 
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It is convenient, now and in the sequel, to introduce the notation 

:Pr(p | r^i, a = a) p a (o) da. = p r (p I m i9 a). (7.33) 

J — CO 

The interpretation of Eq. 7.33 is as follows: for given values of p and i, 
the entity p t { p | m { , a = a) is a number whose value depends on a. But 
a is a random variable, and the probability that a will lie in an interval 
[a, a + da] is p a {a) da. This probability assignment, together with the 
collection of numbers {/> r (p | a = a)} obtained by varying a with p 
and i held fixed, defines another random variable which we denote 
Pt(P | (A function of a random variable is a random variable.) 

From the theorem of expectation (Eq. 2.126) the mean value of this 
new random variable is 

E[Pr(P | m i, a)] = p ( (p | m,-, a = a) p a { a) da, 

J —CO 

which is Eq. 7.33. Similarly, if x and y are random vectors and A is a 


random event, we write 

Cco 



Px(P) = Px(p | y = Y) Py(y) dy = p x ((3 

J — 00 

and 

f 00 

|y) 

(7.34a) 

PM = P[X | y = Yl My) ^ = PM | 

J). 

(7.34b) 

In this abbreviated notation Eq. 7.32 becomes 



MP | m i) = Pt(P 


(7.35) 


Equation 7.35 provides the key to the analysis of the optimum receiver for 
the fading case of Eq. 7.30. With white Gaussian noise. 


Pr(P [ a) = p D ( p - a Si ) 

~ exp ( - •p I p - as^ (7.36a) 

and therefore 


P r (p | nit) 



(7.36b) 


Whether or not this average can be evaluated (and meaningfully inter- 
preted) depends on both the {s t -} and p a . From the vector-signal point of 
view it is clear that the attenuation factor a corresponds to a radial 
scaling of the received signal constellation {as,}, as shown in Fig. 7.24. 
If a were known to the receiver, it would use this knowledge in the 
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determination of optimum decision regions (see Chapter 4). Accordingly, 
we may expect that the optimum decision regions when a is not known 
will be some complicated compromise between the different optimum 
regions implied by the different values that a may assume, weighted in 
accordance with the probability density with which a assumes such values. 

In one instance the situation is uncomplicated. If the attenuation a 
is always positive and if the signals are all of equal energy E s , then the 
boundaries of the optimum decision regions are themselves radial, as 
shown in Fig. 7.25, and thus invariant to radial scaling of the received 
signals. Under these conditions, it is obvious that a correlation or 
matched-filter receiver structure is still optimum, completely independent 
of whether a, or even p a , is known precisely. 

The validity of this argument can be checked formally by eliminating 
terms that are independent of i in Eq. 7.36b. With equal-energy signals 
we have 

P,(P I rm) ~J ^exp ( + |r p ■ *i ) e p a (o) da. (7.37) 

For any p a such that p a { a) = 0 for a < 0, the index i that maximizes this 
integral is the index i that maximizes p • s*. 

On the other hand, it must be emphasized that the error performance 
provided by the optimum receiver is very much affected by p a . For 
example, if equally likely binary signals are used, with 

s 0 — —Sj = V £ s <Pi, (7.38a) 

then the received signal energy when a — a is a 2 E S and 

p[6] =X>(“Vf rfa - 4X1)- (7 - 38b > 

If a has a high probability of being near zero, the probability of error is 
large. In any event, it is shown in Appendix 7B that- 

/ fW\ 

P[S] > Q « J— s , (7.38c) 

in which a is the mean of a. The equal sign holds if and only if a is not 
random; that is, if p a (a) = <5(a — a). 

Random Phase 

Random phase introduces additional complication into the analysis 
of the optimum receiver for bandpass signals. We now consider the 
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situation in which the transmitted signal is 

s°(t) = s(t) sjl cos (o) 0 / — 0), (7.39a) 

where s(t ) is chosen from a set of equally likely signals {$*(/)} of ideal 
lowpass bandwidth W and 0 is a random variable with a probability 
density function p 0 that is uniform over the interval [0, 2-rr]. 

I— , 0 a < 2tt 

Po(a) = 277 (7.39b) 

\0, elsewhere. 

As usual, we shall consider the received signal to be corrupted with 
additive white Gaussian noise : 

r(t) = s°(t ) + n w {t). (7.39c) 

The transmitted signal s°(t ) corresponds to the DSB-SC signal in 
Fig. 7.9, except that 0 reflects uncertainty at the receiver about the exact 
phase of the received signal. This uncertainty may develop in many 
different ways ; for example, by slow oscillator drift or by small random 
changes (of the order of 1 // 0 ) in the propagation time between transmitter 
and receiver. In spite of the unknown phase, a DSB-SC demodulator with 


= s(t) cos 0 + n c (t) 


= s(t) sin e + n s (t) 

Figure 7.26 DSB-SC demodulation with random phase. It is convenient to ascribe 
the parameter 6 to the transmitter, even through the random phase may actually 
originate within the channel. With this convention and appropriately band-limited 
signals, the channel disturbs transmission only by adding white Gaussian noise; so 
that r(t) = s(t)V 2 cos (a> 0 t — 8) + njj). 

sine and cosine channel outputs may still be used as the first stage of an 
optimum receiver, as shown in Fig. 7.26: the only irreversible effect is the 
discarding of noise outside the band occupied by the signals. 

Recalling the trigonometric identity 

cos (x — y) — cos x cos y + sin x sin y, (7.40) 

we can write 



s°(t ) = \jl [,y(t) cos 0] cos co 0 t + V 2 [s(0 sin 0] sin to 0 f. (7.41) 
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The two demodulated waveforms in Fig. 7.26 are therefore 

r c (t) = X0 cos 0 + n c (t) (7.42a) 

r sOO = X0 sin 0 + n s (t). (7.42b) 

The corresponding vector representation is 

r c = s cos 0 + n c (7.43a) 

r s = s sin 6 + n s , (7.43b) 

where s, n c , and n s denote the projections of the random processes s(t), 
n c (t) and n s {t) on the set of orthonormal functions {<p 3 (0} used to describe 
the lowpass signal set 


Si(t) = 2% <Pj( 0; 
*=■1 


i = 0, 1, . . . , M — 1. 


The first important observation to make about Eqs. 7.43 is that both 
demodulator outputs contain signal components and that the optimum 
receiver must therefore observe them both. Accordingly, given r c = a 
and r s = (3, the optimum receiver sets m equal to that m t for which 
PD”* | r c = a, r, = (3] is maximum. Since we are assuming that all are 
equally likely, this is equivalent to maximizing 

Pr c ,r s («. P I «h) = Pr c ,r 3 (a. P | 0), (7.44) 

in which we again use the notation of Eq. 7.33. 

The next important observation is that n c ( t) and n s (t ) are (see Eq. 7.25c) 
statistically independent stationary Gaussian processes, each having a 
power spectrum that is uniform (with density JVy2) over the rectangular 
frequency band [— W, W] occupied by s(t). Since noise power outside 
the band occupied by the signals does not affect the optimum receiver, we 
may assume n c {t) and n s (t) to be white. Projections of these independent 
processes onto orthonormal functions yield independent Gaussian vari- 
ables, each of variance J^ 0 /2. Hence 

Pn c , n,(p> v ) = Pn c (V)P n (y) = bTwv eX P “ (!P -| 2 + |v| 2 ) , (7.45) 

^7tJ\ 0/ ) L Jt 0 

in which N is the number of orthonormal functions in the set { 9 ?//)} use( ^ 
to describe the {^(t)}- Thus 

Pr c .r 3 («> P | Wi> 0) = Pn c (« ~ S 2 - COS 0) pJP - S { sin 0) 


+ Ip — Sf sin 0| z ) 


exp - — (let - s t . cos 
L JV 0 
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Expanding the magnitude-squared terms in the exponent and dropping 
factors independent of i, we find that the receiver must maximize 

Pr«,r s («. P | ~ exp -J- (a • s f cos 0 + p • s f sin 8) e~ Ei ' x \ (7.46) 

LJV o 

in which E t is the energy of s^t). 

The form of the optimum receiver may be extracted from Eq. 7.46 in a 
straightforward manner. We define, for / = 0, 1, . . . , M — 1, 


and 

Xi = V(r c • s*) 1 2 * + (r, • s if 

(7.47a) 

where 

fa 4 tan- 1 ^*', 

r c * Si 

(7.47b) 


• Si = r o (0 stf) dt 

(7.47c) 

and 




r s * s i == J r s (t)Si(t)dt 

J— CO 

(7.47d) 


are the correlations of the /th signal with the outputs of the cosine and 


iys; 

iv Si 

Figure 7.27 Polar transformation of matched-filter outputs. 

sine demodulator, respectively. Applying the identity of Eq. 7.40 and 
invoking the transformation of Fig. 7.27, we have 

X i cos (0 — <f)i ) = r c • s 2 - cos 8 -f r s • s, sin 6. (7.47e) 

In particular, when r c = a and r s = (3, 

Xi cos (6 — <f>i) = a • s i cos 8 + (3 • s ; sin 6 (7.48a) 

and the average in Eq. 7.46 may be written 

exp [tT cos “ &)1 = f ex P [~ cos (y ~ <f>i)\pe(Y) dy 
LJV 7 ,, J J-oo LJV’q J 

1 f 2,r VOX. 

= — exp — 2 • cos ( y - fa) dy. (7.48b) 

27t Jo LJV o 
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This integral has been encountered often enough to have been given a 
name and to have been tabulated. Specifically, 

I 0 (x) 4 — f 2 V cos * doc, (7.49a) 

2 rr Jo 

where I 0 (x) is called the “zero-order modified Bessel function of the first 



Figure 7.28 Plot of I a (x). 


kind.” 46,56 A plot of 7 0 (»;) is given in Fig. 7.28. Because of the periodicity 
of the cosine, for any (f> we also have 

I 0 (x) = -L 1 8 V cos (a+ da. (7.49b) 

2tt Jo 

Hence, given observed values of the { X *-}, the optimum decision rule is to 
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set m = m t if and only if 

/ 0 (^)*- E ‘ /Xo (7.50) 

is maximum. 

Additional simplification is possible when the signals {^(0} have equal 
energy, E t = E s for all i = 0, 1, . , . , M - 1. Then we need only to maxi- 
mize 1^(2 XJNq) ; or still more simply, since I 0 (x) is a monotone increasing 
function of a;, we need only to maximize or Xf, where 

X* = (r c * S< ) 2 + (r s • s f ) 2 . (7.51) 

Equation 7.51 specifies one form of optimum receiver for equal-energy 


■y/2 cos wo £ 



Figure 7.29 Correlation receiver for equal-energy signals with random phase; r(t) — 
s(t)V 2 cos (e o 0 t — 6) + «„(/). 


signals. As shown in Fig. 7.29, this optimum receiver correlates the sine 
and cosine demodulator outputs against each of the M possible lowpass 
signals. For each the receiver forms the sum of the squares of the 
cosine correlation and the sine correlation. The values of the (A'* 2 } so 
obtained are then fed to a comparison device that determines which X? is 
largest, i — 0, 1, . . . , M — 1. 

Just as with the known-phase receiver of Fig. 7.16, the fact that the 
demodulator outputs are correlated against the lowpass signals {^(0} 
implies that the lowpass filters W(f) at the demodulator outputs in 
Fig. 7.26 are redundant; they, have been eliminated in Fig. 7.29. It is 
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clear from this figure that Eqs. 7.47c and d may also be written | 

_ ij 

r c * Si = r(t)y/2 s£t) cos co 0 t dt (7.52a) 

J —CO | 

r s * = I Htyjl Si(t) sin (o 0 t dt. (7.52b) | 

An alternative form of optimum receiver is shown in Fig. 7.30, in 
which lowpass matched filters with impulse response 

gi(t) = Si(T — /); i = 0, 1, . . . , M — 1 (7.53) 

are substituted for the correlators. Of course, all ideal lowpass signals 



~\[2 sin woft) Samp'eat 

Figure 7.30 Matched-filter receiver for equal-energy signals with random phase. 

must have infinite duration and the {#,•(*)} are not physically realizable 
As a practical matter, however, the receiver of Fig. 7.30 becomes both 
physically realizable and essentially optimum whenever almost all of the 
energy in each of the {s f (t)} is located within 0 ^ t < T and — W < f < W. 

Envelope detection. A substantially different form of optimum detec- 
tion for equally likely equal-energy signals with random phase may be 
realized as follows. Let us consider feeding the signal r(t) in Fig. 7.26 
directly into a filter bank with impulse responses {A f (0} matched to the 
bandpass signals (s/fr)} except for the discrepancy necessitated by the fact 
that the phase is unknown. We define 

h(t) 4 Si(T — t)^/2 cos co 0 t; 


i = 0, 1, . . . , M - 1. (7.54) 
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The output of the /th filter, say ult), is 
Ui(t) = (* r(«) hit - a) da 


= yfl r(a) Si(T — t 4- a) cos a > 0 (t — a) da 

•J — GO 

= ^/2 cos eo 0 t j r(a) s f (T — f + a) cos £o 0 a da 

J— co 

+ ^/2 sin <« 0 t I r(oc) s<(T — t 4- a) sin co 0 a da 

J — co 

= w C j(/) cos a> 0 2 + M sj (0 sin <a 0 t. (7.55) 

In the last line we have introduced the definitions 

u oi (t) = f r(a) Si(T - t + a)y/2 cos a) 0 a da, (7.56a) 

J— co 

and 

« si (0 = f r(a) s t (T — t + a)^2 sin co 0 a da. (7.56b) 

J — 00 

Since the variation in time of the components u oi (f) and u si (t) is due 
solely to sit), which is a low-frequency (hence slowly varying) signal, 
both u ci (t ) and u si (t) remain relatively constant over several cycles of 
cos a ) 0 t. Thus it is useful to use Eq. 7.40 again and to write the matched- 
filter output as 

ult) = Xlt) cos [<V - <M0L (7.57a) 

where 

X&) = V « ef 2 (0 + U si \t) (7.57b) 


(7.56b) 


is a slowly varying “envelope” and 


«cf(0 


(7.57b) 


(7.57c) 


is a slowly varying phase. 

In conclusion, we observe from Eqs. 7.52 and the definitions of Eqs. 
7.56 that at the instant t = T 


Hence 


u„lT) = r c • s f , u si (T) = r s • s*. 
XAT) = X, = [(r c • s,) 2 4- (r s • s,) 2 ]' 


(7.58a) 

(7.58b) 


UT) = & = tan“ 


(7.58c) 
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It follows that an optimum receiver can be built by using bandpass filters 
matched (except for the unknown phase) directly to the {sfit)}, followed 
by envelope detectors sampled at. time T, as shown in Fig. 7.31. 

Figure 7.32 illustrates how the envelope of ult) changes slowly with 
time, whereas u t (t) itself varies sinusoidally with frequency f 0 within its 
envelope. This fact explains why the optimum receiver looks at the 
envelope, rather than at ult ) itself, when the phase of the sinusoidal 



at t = T I 1 

hi(t) - Y2 si(T — t) cos wot; i = 0, 1 M — 1 

Figure 7.31 Envelope-detector receiver for equal-energy signals with random phase. 

variation is unknown. Receivers that examine only the envelope of the 
matched-filter output, hence do not utilize knowledge of the phase, are 
termed incoherent receivers ; receivers that do exploit knowledge of the 
phase are called coherent. , 

Probability of error. Since an incoherent receiver does not consider 
phase information, it cannot yield so small an error probability as a 
coherent receiver. To illustrate this, we now calculate the probability of 
error for white Gaussian noise when one of two equally likely messages is 
communicated over a system utilizing an incoherent receiver and the 
DSB-SC modulated equal-energy orthogonal lowpass signals 

sit) = Vf s (fit), = -JW, (7.59a) 

Sit) = 'Je s <ph), s 2 = -Je s ep 2 . (7.59b) 

From Eq. 7.51 the optimum incoherent receiver in this case sets m = m 1 
if and only if 

(to • Si) 2 + ( *, • Si) 2 > (*c ■ s 2 ) 2 + (r, • S 2 ) 2 . (7.60) 
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$i(t)-s[ 2 sin wo t 



Envelope 



Figure 7.32 Envelope detection of (noiseless) pulsed sine wave. 

Each of the vectors in Eq. 7.60 is two-dimensional. In accordance with 
Eq. 7.43, 

r c = s cos 6 + n e 
r s = s sin 6 + n g , 

so that when m — m* the vector components of r c and r s are 
r c i ^ r c -<pi = s/EtCOsO + n cl , 

r 5i — r s -<pi — sin 6 + 

A __ (?■«> 
? c2 — r c * *P2 — M 02 > 

r s2 = r s -cp 2 = «s2- 

In terms of these coefficients, after cancellation of a common factor of 

V the condition of Eq. 7.60 becomes 

r c i 2 + r si 2 > r cz + rj. (7-62) 

Since cpyit) and <p z {t) are orthonormal and n c (t) and n s (t) are statistically 
independent Gaussian processes, the random variables n cl , n sl , n o2 , n s 2 
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are statistically independent Gaussian variables, each with density function 




From Eqs. 7.62 and 7.63 

P[S | mi, (r cl 2 + r sl 2 ) = jR 2 ] = P[r c2 2 + r s2 2 > R 2 ] 

= JJ Pn(«) P niffy d<* d$ 




1 _<*»!.•* 


(a 2 +P Z )/JT 0 


2 J{ „^0 

« 2 +/} 2 >.B 2 


= r rdr r d0 ^ e ^. 

JR JO nN o 


_ g“-® 2 / N 0 = g— (r c > 8 +rji 2 )/J'r 0 _ (7.64) 

We calculate P[8 | mj by averaging Eq. 7.64 over the random variable 

(rci 2 + r sl 2 ). Thus 

P[S | m x ] — P[8 | m lt r cl 2 +r sl 2 ] 

(tci^+Tsi^Ta^o 

= g-rnl^o e -r,rlN 0 ' { 1 . 65 ) 

The last line exploits the fact that r cl and r tl are statistically independent 
under the condition that my is transmitted. 

The averaging in Eq. 7.65 is most easily carried out in two steps: we 
first calculate the conditional error probability P[.§ [ m x , 6 = y] and then 
average over the random phase 6. Using the notation E [z \ y = ft] to 
denote the conditional mean of 2 , given y = ft, from Eq. 7.65 we have 

P[S | mj, 0 ~ y] = E[e -rcl2/J ' f, ° | 6 = y] E[e” r,l2/JV ’° | d = y]. (7.66a) 

When 6 = y, the random variables r cl and r sl are Gaussian, with variance 
X 0 j2 and means 


r., = ./£„ 


F.- = . /E. sit 


In Appendix 7C we prove the following extremely useful lemma: 

Lemma. Ifx is any Gaussian random variable with mean try, and variance 
a x 2 , and w is any complex constant with real part less than (2a*)- 1 , the 
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expectation of e KX ~ is 


wm x 2 /{i—2\va x ) 


'1 — 2wa~ 


Re(w) < ~— 2 • (7-67) 


Applying this lemma to Eq. 7.66a, with w — — 1/-N* 0> we have 

exp (“iF„ £ - oos2y ) exp (“iF 0 £ ’ sin2y ) 

P[g | e = y] — -p ' 

= - exp V E s ( cos 2 y + sin 2 y) . 

2 L 2JV 0 J 



10 log 10 E s /A’o 

Figure 7.33 Probability of error for coherent and incoherent reception of two equally 
likely orthogonal signals of energy E s . 
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Finally, since the right-hand side is independent of m 1 and y, averaging 
over m and 6 yields 

P[8] = ie- E ° /zXo . (7.68) 

It is interesting to compare this result for two equally likely orthogonal 
lowpass signals with incoherent reception to the corresponding result for 
coherent reception. In the coherent case we have from Eqs. 4.78 and 
2.121 


Xj ^ V2t tEJX 0 


-E s /2jf 0 


For large EJN 0 the bound of Eq. 7.69 is very tight and the error per- 
formances of optimum coherent and incoherent receivers are exponentially 
equivalent. Plots of the two error behaviors are given in Fig. 7.33. 



Insight into the near equivalence of the coherent and incoherent cases 
is provided by the phasor diagram in Fig. 7.34. In accordance with Eqs. 
7.57, 7.58, and 7.61, we can think of the output of the signal-excited 
bandpass matched filter in Fig. 7.31 (scaled by the normalizing factor 
1 /V E s ) as being represented for t T by a signal phasor of length \Te s 
and phase d to which is added vectorially a random noise phasor. The 
resultant phasor rotates with angular frequency u> 0 = 2n f 0> and the 
normalized filter output is the projection on the horizontal axis. 

The noise phasor can be resolved into two components, one in phase 
with and one in quadrature with the signal phasor. Since the lowpass 
noise processes nft) and -njf) in the bandpass noise representation of 
Eq. 7.28 are stationary and a change in the signal phase corresponds to a 
shift in time origin, the statistical properties of the in-phase and quad- 
rature noise components are the same regardless of the value of 6. Thus 
these narrow-band filtered lowpass noise phasor components vary slowly 
in length, each having the amplitude of a statistically in depend ent 
Gaussian process with mean power J^ 0 I2 and rms amplitude v J'f 0 j2. 
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In the case of coherent reception, the phase 6 is known and the optimum 
receiver observes only in-phase noise components. On the other hand, 
the incoherent receiver observes the envelope of the filter outputs. For 
the signal-excited filter, this corresponds to the length of the total signal- 
plus-noise phasor. When \j EJN 0 » 1 , however, both the in- and out-of- 
phase noise components are usually much smaller than the signal com- 
ponent. It is clear from the geometry of Fig. 7.34 that the amplitude of 
the envelope is then affected primarily by the in-phase noise alone. 

We may also consider the bandpass matched filter whose output is 
noise alone in much the same way. The length of the vector sum of two 
orthogonal Gaussian vectors can be large (in relation to the signal) only 
if one or both of its components are large. With weak noise, the proba- 
bility of such an event is not substantially larger than that which would 
obtain if we observed only a single noise component, as with the coherent 
receiver. Thus lack of phase information does not seriously affect the 
statistical nature of the output from either of the two orthogonal matched 
filters when EjJf 0 is large. This explains why envelope detection yields 
an error performance that is not seriously degraded under high energy-to- 
noise ratio conditions. 

Although coherent and incoherent receivers yield comparable error 
performance when one of two equally likely orthogonal signals is trans- 
mitted in weak noise, it does not follow that complete lack of phase 
knowledge is not costly. Indeed, when coherent reception is possible, the 
two signals may be chosen to be antipodal rather than orthogonal, which 
saves 3 db. The 3 db may also be saved when the channel phase varies 
slowly enough so that phase continuity between successive transmissions 
can be exploited, even though the absolute phase may be unknown. One 
way to accomplish this goal is described in the next two sections. 

Phase comparison and channel measurement. An interesting strategy 
for communicating binary data over a random-phase channel is to trans- 
mit a known reference waveform, say g(t)\j 2 cos o> 0 t, to measure the phase 
of the channel, and then to transmit the message by means of antipodal 
signals ±j 0 (/)V 2 cos o.t 0 t, corresponding to m = m x and m = w 2 , re- 
spectively. Note that these message signals are optimum if the phase is 
known but are indistinguishable if the phase is uniformly distributed over 
[0, 27r], which points out the need for the phase-reference measurement. 

We now consider the particular case in which the lowpass waveforms 
q{t) and s 0 {t) are orthogonal, each with energy \E S . One possible choice of 
signals, with corresponding vector representation, is illustrated in Fig. 7.35. 

It is immediately apparent from the vector representation that the 
total transmitted waveform (reference waveform plus signal) may be 


generated in a single step by appropriately supplying either 

Siit) = q{t) + s 0 {t) (7.70a) 

” «s(0 = ?(0 - s„(t) (7.70b) 


to the input of the transmitter modulator. The two signals s x (t) and 
s 2 (t) are orthogonal and each has energy E s . Thus there is no distinction 
(other than point of view) between the signal set here and in the envelope 
detector analysis encompassing Eq. 7.59. 


<P2 




Figure 7.35 Signals for channel measurements and phase-comparison reception. 


It remains to be shown how this point of view may be extended to the 
design of an optimum receiver. We now prove that an appropriate phase- 
reference and comparison receiver always makes the same decision as the 
optimum envelope-detector receiver. First, from Eq. 7.60 we note that the 
latter receiver sets m = m 1 if and only if 

( r c * si ) 2 + (t s ' si ) 2 > Oc • s 2 ) 2 + (r s • s 2 ) 2 , 
which may be rewritten as 

[r c • (q + s 0 )] 2 + [r s • (q 4- s 0 )] 2 > [r c • (q - s 0 )j 2 + [r s • (q - s 0 )] 2 . 

Simplifying, we observe that the optimum receiver sets m = m l if and only 
if 

( r c • q)(r c * s 0 ) +’( r s • q)(r s • S 0 ) > 0. (7.71) 

Finally, if, as illustrated in Fig. 7,36, we define the two new vectors 

Q = ( r c * q» r s • q) (7.72a) 

S 0 -Ovs 0) r s . Sa ), (7.72b) 
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then Eq. 7.71 may be written 

Q • S 0 > 0. (7.72c) 

The interpretation of the optimum decision rule of Eq. 7.72c is clarified 
in Fig. 7.37. First, from Eq. 7.58c and the discussion leading to Eq. 7.57 
we note that the angle (phase) of Q is just the phase at * = T of the 



\ 


Figure 7.36 Vectors for phase-comparison decision. The optimum receiver sets 
m = /»! whenever S 0 lies to the right of the dashed line. 

output of a. filter “matched” to q(t)^2 cos w 0 t and the angle (phase) 
of S 0 is just the phase at t = T of the output of a filter “matched” to 
s 0 (t)sl 2 cos ay. Equation 7.72 implies that an optimum decision rule is to set 
m — m 1 if and only if the magnitude of the phase difference between Q 
and S 0 is less than 90°. Thus a receiver that measures the phase of Q and 
uses it as a reference for comparison against the phase of S 0 is optimum. 

When EJN 0 is large, the measured-reference phase is with high proba- 
bility nearly the same as the actual phase of the channel, which provides 
another interpretation of the smallness of the degradation of performance 
sustained by the incoherent receiver under these conditions. 



Figure 7.37 Matched-filter and phase-detector receiver. As usual, the random (but 
time-invariant) channel phase 9 has been ascribed to the modulator. The filters are 
matched, except for phase , , to the transmitted signals 

h„(t) - q(T - t)^2 cos (o 0 t, 
h,(t) - s 0 (T - t)V 2 cos co Q t. 


Differential phase-shift keying. A practical system 22 exploits the 
reference-phase idea in a clever fashion to transmit a sequence of binary 
messages over a channel whose phase changes very slowly. To transmit 
the first binary message, a signal is preceded by a reference. For the second 
binary message, the signal portion of the first transmission is used as 
reference and only the new signal is sent. This scheme is continued with 
each signal serving as reference for the next, as illustrated in Fig. 7.38. 
If at any time m 1 is to be transmitted, the phase is left unchanged from the 
preceding transmission, whereas if ra 2 is to be transmitted . the phase is 
changed by 180°. Decisions are made on the basis of whether the new 
received phase and the preceding received phase differ in magnitude by 
more or less than 90°. It follows that as long as negligible channel phase 
drift occurs between successive transmissions the error probability is 
the same as that of the incoherent reception of orthogonal signals with 
energy equal to that of the reference plus signal, say E. From Eq. 7.68 

P[g) = 

But for differential phase-shift keying, the energy actually used in the 
.transmission of each binary message is just that of the signal portion, 
E s — i?/2. Hence 

P[g] = (7.73) 

and a 3-db improvement has been obtained by the double use of energy. 
For large values of EJN 0 and slow phase drift, the degradation in average 
performance from that obtained with an optimum coherent binary system 
by using antipodal signals is negligible. The major distinction is that errors 
tend to occur in pairs, since an error on one message implies a high 
probability of having received a bad noise, which in turn implies a poor 
reference for the next decision. 

7.4 FADING CHANNELS 

We have just treated situations in which either the phase or the ampli- 
tude of the received signal is random.. We now treat one in which both 
phase and amplitude are random. In particular, we shall investigate the 
design and performance of optimum receivers for the case in which, if 

s°(t) — s(t)s/ 2 cos a> 0 t (7.74a) 

is transmitted, the received signal in the absence of additive noise is 

r°(t) — as{t)-j2 cos (co 0 ? — 0). (7.74b) 

In Eq. 7.74b we assume that s(t) is a lowpass modulating signal with 
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Figure 7.38 Differential-phase-shift-keyed waveforms for message sequence m x , m u 
m 2 , m r . For each message the reference waveform is the dashed pulse, and the signa 
waveform is the solid one. The energy of each pulse is E $ . 
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bandwidth W « (ojlir and that the joint probability density function of 
the gain and phase parameters {a, 6) is 


lO; elsewhere. 

Thus a and 6 are statistically independent, with 




a > 0, 0 < <b < 2tt 


0 


a ^ 0, 


Po(4>) = 


0 <b < 2tt. 


The transmission gain a is Rayleigh distributed with 

a = W 77 ^’ 


a 2 = b > 0. 


(7.75a) 

(7.75b) 

(7.75c) 

(7.76a) 

(7.76b) 


Thus if the transmitted energy is 


f" dt = f " s\t) dt 4 E„ (7.77a) 

J — CO J — 00 


the mean received signal energy is 


E f °° [r°(0J a dt = E [a%] = bE s 4 E s . (7.77b) 

J— CO 

Scattering Model 

It is instructive to investigate the circumstances under which the 
input-output relation given by Eqs. 7.74a and b is a reasonable model for 
an actual communication problem. Consider the situation shown in 
Fig. 7.39, in which there is a large number of “scatterers” located at 
random points within the propagation path. Let the component received 
from the yth scatterer be 

r°it) 4 Cj s(t - T f )y/ 2 cos co 0 (t - Tj). 

The total noiseless received signal is then 

r°(t) = 2r;(t). 

all j 

If the delays {r 3 -} are all small in relation to the reciprocal bandwidth of 
s(/) but comparable to 27 t/o> 0 , then we can write 

r°(f) s(t) 2 Cj-fl cos (<o 0 t - (7.78) 

all j 


(7.78) 
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in which d> . 4 o> 0 t 3 , Equation 7.78 is predicated on the fact that s(t), being 
narrow-band, cannot change significantly over the time interval spanned 
by the {t,.}, so that s(t - r # ) s(t) for all/t . 

If r°(t) is fed into sine and cosine DSB demodulators such as those m 


Scatterers 



Figure 7.39 Scattering of transmitted signal. 

Fig. 7.146, the two outputs are 

r c (0 = s(t) X c, cos cf )j (7.79a) 

i 

and 

r s (t) = s(t) X c s sin (7.79b) 

t 

We now investigate the statistical properties of the parameters 

* c 4Xc,cos^ (7.80a) 

i 

z s 4 X C t sin cj>.. (7.80b) 

i 

The simplest case— and the only one that we consider occurs when 
the U } are statistically independent and each is uniformly distributed 
over [0, 2it\. Assume initially that the {c,} are identically distributed 
random variables, each of which is statistically independent of the others 
and of the {</>,}. Then the central limit theorem states that both z c and z s 
are approximately Gaussian whenever the number of scatterers is large. 
Moreover, by virtue of the statistical independence of the {, c ,} and the 

t In Eq 7.78 we neglect the gross delay that is due to the average distance over which 
the signal must propagate. Such delay amounts to a shift in time origin. Only 
incremental delays are pertinent to the analysis. 


{<f> 3 }, we have 
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= X e i C0S Vi’ 


z c = 1 1 C k C i COS <f> k cos <f> it 
k i 

z ? - 1 1 Cifii sin <f> k sin 4>], 

k 3 

— 2 2 cos <f> k sin <£,, 

k 3 


sin (fij = cos <f>j — — cos a da = 0, 

. o 2tt 


cos (f) j sin = \ — cos a sin a da. = 0, 

Jo 27 r 


sin 2 <f>. = cos 2 <f> } = —• cos 2 a da. = 

Jo 2v 

cos <l> k sin </>,• = cos <f> k sin <£,- = 0 
cos <f> k cos </>j — cos <j> k cos (f>j = 0 k 5^ j, 

sin </} k sin <£,, = sin <j> k sin <f> d — 0 J 


in which we have exploited the statistical independence of the {$,•}. Thus 
Eqs. 7.81 become 

T c = z s = 0, (7.82a) 


2 „ 2 .2 

c — — f Z C 3 ’ 


z c z s = 0. 


(7.82b) 

(7.82c) 


We see that z c and z s are not only Gaussian but are also statistically 
independent of each other and have zero means and equal variances. It 
follows that 




6 4Xc/ = 2z c 2 = 2 z s 2 . 


(7.83a) 


(7.83 b) 


It is interesting to note from Eqs. 7.81 that the central limit theorem 
implies that the joint density function is given by Eqs. 7.83 even if 
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the {c,} are constants, say 

c t = c; for ally. 

Thus the critical conditions required to justify modeling z c and z s by 
statistically independent Gaussian random variables are that the {< p 3 -} be 
uniformly distributed and statistically independent and that the number 
of scatterers be very large. The assumption that these conditions are met 
is valid in many cases of practical interest, in particular with troposphenc- 
and ionospheric-scattering channels. 

It is now easy to see that the density function of Eq. .7.83 leads to the 
input-output relation given by Eqs. 7.74 and 7.75. We can recompose 
r°(0 from the quadrature components of Eq. 7.79 and write 

r°(t) = s(t)[ z c cos "f" Zs s ' n 

= s(t)j2acos(o> 0 t-d), (7.84a) 

in which 

a 4 Vz c 2 + z s 2 ( 7 - 84b > 

6 4 tan -1 — . ( 7 - 84c ) 

But we have already observed (Eq. 2A.ll with p = 0) that the density 
function p z , of Eq. 7.83a implies the density function p a>9 of Eq. 7.75a. 
Thus our scattering-channel model leads to a received signal with a 
uniformly distributed phase and a Rayleigh-distributed amplitude. 

Single Transmission 

We now consider the simplest case involving both fading and the 
addition of white Gaussian noise, in which 5(0 represents the single trans- 
mission of one of M equally likely lowpass signals {*,(/)}. The total 
received signal is therefore 

r(0 = a 5(0 V 2 cos ( a> 0 t — 6) + njt). (7.85) 

Determination of the optimum receiver structure is straightforward. 
We first observe that if a were known, the optimum receiver would simply 
act as if the modulating signal set had been {05,(0} and in accordance with 
Eq. 7.50 would therefore determine that i for which 

j ( 2aX *\ c (7.86a) 
\ J'f’o / 

is maximum. Here E t denotes the energy of 5,(0 and as before the quantity 

r r I'm “ 1 2 I’co 

X,= l j r c (0 5,(0 dt\ + [j _ jti) *(*> dt \ J ' (7,86b) 
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may be identified as the sampled envelope of a bandpass filter matched 
to 5,(0'-/ 2 cos (o 0 t. When all £, are equal, i = 0, 1, M — 1, the 
decision implied by this rule is the same regardless of the specific (positive) 
value assumed by a. In this case the envelope detector of Fig. 7.31 is still 
an optimum receiver, even though a is now a (positive) random variable. 

Probability of error. For M = 2 and orthogonal signals of energy E s , 
the probability of error is also easy to determine. From Eq. 7.68 we have 

P[8] = P[£ | a] = (7.87) 

But 

d 2 = z c 2 + z s 2 , 

and z c and z s are statistically independent Gaussian random variables with 
zero mean and variance b/2. Accordingly, 


By invoking the lemma of Eq. 7.67, with w = —Ej2Jf 0 , of = bj 2, and 
m x — 0, we have 


1 

2 + EJJC o ’ 


(7.88) 


in which E s = bE s is the mean value of the received energy. 

Discussion . Equation 7.88 states that the minimum attainable error 
probability in communicating one of two equally likely orthogonal signals 
over a Rayleigh-fading channel decreases only inversely with the trans- 
mitted energy. This behavior is in marked contrast to the nonfading case, 
in which the error probability decreases exponentially with E s . The differ- 
ence in performance is attributable to the fact that even when the average 
received energy on a fading channel is high there is still an appreciable 
probability that the actual energy received on any given transmission is 
quite small; that is, there is an appreciable probability of a “deep fade.” 
This is evident in the plot of the Rayleigh density function in Fig. 2.21b. 


Diversity Transmission 

The only efficient way to reduce the error probability with a Rayleigh- 
fading channel is to circumvent the high probability of a deep fade on a 
single transmission. This is accomplished by means of diversity trans- 
mission. The idea of diversity is simple ; scattering channels of practical 
interest are characterized by the fact that the scattering elements in 
Fig. 7.39 move randomly with respect to one another as time goes on. 
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Thus the received signal amplitude and phase are actually random proc- 
sesses, say, a(t ) and Q(t). Of course, at any stated observation instant t ly 
the parameters a(t t ) and 0(r t ) are Rayleigh and uniformly distributed 
random variables, respectively. The preceding analysis of a single trans- 
mission is therefore valid whenever the duration of the signal s(t) is short 
enough compared with the rate of variation of a(t) and d(t) that these 
processes are relatively constant over the signaling interval. Over an 
extended period of time, however, we anticipate that the observed values 
of a{i) will fluctuate, being sometimes large and sometimes small. One 
form of diversity, called time diversity, involves sending the same signal 
s(t) over and over again, say L times, in the hope that not all of the trans- 
missions will be subjected to deep fades. 

The objective of time diversity is to space successive transmissions in 
time in such a way that the fading experienced by each transmission is 
statistically independent. Let the instants at which successive trans- 
missions begin be {/,},/= 1, 2, ...» L. The value of a(t) observed at 
any instant depends primarily on the phase relationships then existing 
between the mutually interfering incremental wavelets received from 
individual scatterers. Accordingly, if the {**} are spaced far enough 
apart so that the phase samples 

m> wo. . . . . Wi) 

are statistically independent, so also are the corresponding gain samples 
a{t \), a(t 2 ), a(tj). 

We therefore assume for purposes of analysis that 

P. f e = IT Pai.es ( 7 -89a) 

i=i 

where 

a, A a(t,\ 0 , 4 6(1,), (7.89b) 

a 4 a L X (7.89c) 

9 4 (e v e,,..., e L y (7.89d) 

As in the single-transmission case, we also assume for / = 1,2 L that 


Pa lt 9 , ( a > 0) = y h l 

1,0; elsewhere. 


a 0, 0 ^ (f> < 277 


(7.89e) 


Equation 7.89e makes provision for the fact that the fading parameters 
{bt} affecting different transmissions may be unequal. The final assump- 
tion is that each of the L diversity transmissions is sufficiently short 
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compared to the spacing between the {?,} that the fading during each 
transmission may be considered constant. 

Thus far we have considered only time diversity. Other methods of 
obtaining L different received signals are frequency and space diversity. 
These techniques are discussed briefly at the end of this chapter. Since 
the performance attainable with diversity depends only upon the 21,- 
dimensional joint probability density function p aQ> the analysis that 
follows is independent of the particular diversity technique from which 
p aQ results. 



Figure 7.40' DSB demodulation of L-fold time diversity transmission. When m = m k 
the baseband signal is 

x 

s(0 = 2 S k (t - /,)• 

£—1 

For the scattering channel model under consideration, the resulting demodulator 
outputs are 

L 

r c {t) - ^ a ‘ cos ~ ) + *„(')> 

£ = 1 
L 

sin ~ 0) + «#)• 

i=i 

Optimum diversity receivers. When Z-fold time diversity is used in 
conjunction with DSB demodulation, as in Fig. 7.40, each transmission 
results in two lowpass output waveforms: one each from the sine and the 
cosine demodulators. Thus there is a total of 2 L relevant lowpass wave- 
forms available at the receiver. We now consider the optimum way to 
use these waveforms in the determination of m. 

Assuming that the input message is m k and numbering the cosine 
demodulator outputs consecutively from 1 to L and the sine demodulator 
outputs consecutively from ( L + 1) to 2 L, we may write these 2 L wave- 
forms as 

{{a, cos 6 t ) s k (t) + n,(0; 

^(0 = { 

sm d t _ L ) s k (t ) + nit); 


l = 1, 2, . . . , L 

(7.90) 

l = L + 1, L 4- 2, , 2L. 
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Here for notational simplicity, each of the L transmissions has been 
referred to its own time origin. It is important to remember, however, 
that successive transmissions actually occur during disjoint time intervals, 
so that the {Wf)} represent statistically independent noise processes. 

In accordance with Eq. 7.89, _the {a,} are statistically independent 
Rayleigh random variables with af = b lt and the are statistically 
independent random variables, each of which is uniformly distributed 
over [0, 2it\. If we define 

A (“• cos 1=1,2, I (7 91) 

1 U;_ £ sin d t _ L ; l — L + 1, L + 2, . . . , 2 L, 


Eq. 7.90 can be rewritten as 

r,(/) = z i s k (t ) + l = 1, 2, . . . , 2 L. (7.92) 

The {z t } are statistically independent Gaussian random variables with 
zero mean and variances 

1 = 1, 2, .... 2L, (7.93a) 

* 2 

where we define 


l = 1, 2, . . . , 2L, 


(7.93a) 


br = b t 


l = L + 1, L + 2, . . . , 2Z,. 


(7.93b) 


Finally, in terms of an appropriate orthonormal set 0}> we iiave 

».(<) = 1 = 0.1- M- 1, (7.94) 

J=1 

so that the vector equivalent to Eq. 7.92 is 

T t = z t s k + n 2 ; Z = 1, 2, . . . , 2L, (7.95) 

in which each vector has N components. 

As usual, the relevant part of the additive noise disturbance is encom- 
passed in the (nd. Each component of n t , l = 1, 2, . . . , 2L is a zero- 
mean Gaussian random variable with variance Jf 0 /2 and is statistically 
independent of all other parameters of the problem. 

In formulating the optimum receiver, it is convenient to abbreviate 
notation by defining the 2ZJV-component vectors 

r == C r i* r 2> • • ■ ’ r 2i.) < 7 ' 96a > 

P = (Pi* P» • • * , (7 ' 96b) 

When r — p, the optimum receiver sets m equal to that m* for which 
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PD»< I r = p] is maximum. Assuming that all (mj are equally likely, we 
have 

P K- 1 r = p] ~ p r (p | m x ). (7.97) 

But the noise vectors {n 2 } are statistically independent and 

Pn t = PnJ / “ 1, 2, ... , 2 L, (7.98) 

with 

(7 - 99) 

Given the transmission gains {zj, we therefore have 

2 L 

Pr( P I m i> Ws) = IT - Z l S l)- (7.100) 

1=1 

Since the (zj are random variables, we obtain the likelihood p r ( p | m f ) 
by averaging Eq. 7.100 over the (zj. Thus the optimum receiver deter- 
mines that i which maximizes 

PXP | rn t ) = p r (p | m it {zj) 

1 

^II ex P (7.101) 

In the last line we have exploited the fact that the {zj are statistically 
independent and have discarded factors independent of i. 

The exponent in Eq. 7.101 may be expanded as follows, with E { = Is,] 2 : 

Ipi - z i s i\ 2 = Ipjl 2 - 2 z t (pi • s f ) + z t % (7.102a) 

= | Pi | 2 + E t (z t - . (7.102b) 

\ E { 1 E { 

Introducing Eq. 7.102 into Eq. 7.101, discarding terms that are independ- 
ent of i, and defining p u == p ; • s jEi yields 

P r(P I m i) ~ n ex P \~ ( z i - Plif 1 exp (p 2£ 2 (7.103) 

i=i Jr 0 J \ Jf 0 / 

Each of the 2L expectations in Eq. 7.103 may be evaluated by invoking 
the lemma of Eq. 7.67, with w = -EJNq, m x = ~p u , and a x 2 = bj 2: 


2 L / , 

FI exp ~ — |p 2 - z jS< | 2 
\ JV 0 
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/ \2 

exp — •— K z i — Pa) — 


71 + bjEJJT 0) 


exp — 


1 + bJLEjJC 0 ) 


exp - 


(EjNJpi? 

1 + 5,(£JJ^ 0 ) 


«P(P« TT = ex P ft*’ 


, ^,(£,/Jsr 0 ) 2 
* 1 + ^(JWo) 


( c \2 bll^p 2 
— exp ^(Pj l + H E,IJCJ 


2L 

p r ( p 1 »**) ~ n. 


Defining the weighting factors 


l,JW 


A fcW 
1 + b^EjJCo) 


; / = 1,2, . ,.,2L; i = 0, 1, . . .,M - 1, . 


and the bias constants 


(7.105a) 


c . A _ § in [1 + fc,(£</W]; i = 0, 1, . . . , M — 1, (7.105b) 

2 f=i 

we can write Eq. 7.104 in the simpler form 

jPr(P | m ») ^ exp ^ * ^l)’ ‘ (7 ' 105C) 

The optimum receiver therefore determines the index i for which the 
decision function 


1 2L 
C t + X 

£,• 1=1 


(7.106) 


is maximum. , , , . , , . e . 

If all {s^f)} have the same energy, E t = E„ the (cj are independent of 1 

and may be ignored. If in addition the mean-square gams {b J are the 
same for all transmissions, the weighting factors {w K } are the same for 
all l and i and may also be ignored. The optimum decision rule then 

becomes the following: 2£ 

set m = m, when t, = p„ l = 1. 2. ■ • ■ . 2*. tf «d only */!( p, • s,) 2 
is maximum. 


The quantity § (p, • s<) 2 may be computed by correlating the sine and 
1= 1 
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cosine demodulator outputs for each of the L transmissions against the 
lowpass signal j,(f), squaring, and summing. Alternatively, we recognize 
from Eqs. 7.54 and 7.58 that for each transmission / = 1, 2, . . . , L, the 
quantity [(p, ♦ s f ) 2 -1- (p i+r , • s^is the square of the envelope of the output 
from a filter matched to j,(/) V 2 cos a> 0 t. Thus an optimum receiver may 
sum (over the L transmissions) the squared-envelope samples from filters 
matched to each of the M bandpass signals and determine m in accordance 
with whichever sum is largest. Such a receiver realization is diagrammed 
in Fig. 7.41. 

When the mean energy received on each of the L diversity transmissions 
is not equal, the squared-envelope samples must be weighted before sum- 
mation. For any transmission such that the ratio of the average received 
energy and the noise power density, N 0 , is very small, we see from 
Eq. 7.105a that the weighting factor w u is proportional thereto; on the 
other hand, when biEJJPo becomes very large, w u approaches 1. If the 
{E x ) are not all equal, the bias constants (cj are necessary if the receiver 
decision m is to be optimum. 

yf Receiver interpretation . 47 If the 2 L transmission gains (asj were not 
random but instead were known to the receiver in advance, the optimum 
receiving strategy with equally likely messages would be to determine that 
i for which p T (p \ m is (zj) is maximum. From Eqs. 7.100 and 7.102a the 
receiver would therefore maximize 

2 L 

3 + 2>i( Pi*s<), ’ (7.107a) 

i = i 

where 

a 1 2L 

(7.107b) 

Z r.=i 

On the other hand, when the {zj are zero-mean Gaussian random 
variables, we have seen in Eq. 7.106 that the optimum receiver maximizes 

1 2L 

c i + — 2 W «(P; • %) 2 - (7.108a) 

Ei i = i 

This expression can be written in a form analogous to that of Eq. 7.107a 
by defining 

*k = ~r(?r s »)- (7.108b) 

Ei 

We then have 

j 2 L 2 L 

- leMpl-Sif = S^. 

1=1 1=1 


(7.108c) 
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The interpretation of Eq. 7.108c is of interest. We can think of each 
z u as an estimate of the transmission gain z l based on the received signal 
P* and the hypothesis that s, is transmitted. With this interpretation, the 
optimum receiver can be visualized as first estimating each z u / — 1,2,... 
2E, on the basis of hypothesis s f and the received signal p* and then testing 
this hypothesis (aside from the bias term cj as if each z* were actually 
known to equal the estimated value z u . Since the appropriate estimate 
of each z, is in general different for each hypothesis s*, the implementation 
of a receiver that actually proceeded in this way would be complicated. 
The visualization, however, provides insight into how a “good” receiver 
might be constructed for use in certain communication situations wherein 
the (zj are not Gaussian. 

We now show in what sense the {z u } are estimates of the {z,}. In 
particular, we show that 

z H = E[z { | m it r, = p J; 1 < / < 2L, 0 < i < M - 1, (7.109) 

in which the right-hand side means “the conditional mean of z,, given m- 
and r> = Pj .” 

Proof of Eq. 7.109 is straightforward. We first determine how 
Pz t (P I m b = pj) varies as a function of /?. Discarding factors that are 
independent of /?, we have 

Pr t (Pi | rn { ) 

~ Pr t (Pi | m { , z t = fi) p Z[ (p | m.) 

~ e -l p »-^ s <l 2 M' , o e -/J 2 /6z < (7.110) 


The last line follows from Eqs. 7.93, 7.101, and the fact that z* is statisti- 
cally independent of the transmitted message. Completing the square in 
the exponent of Eq. 7.110 and again discarding factors independent of 
P yields 

Pz t (P | r t = p*) ~ exp -L + h AN ^o) L _ ( . } bJX 0 _ 1 2 j 

1 b i L 1+ WJf jJ /' 

From the definition of w u in Eq. 7.105a an equivalent expression is 


I "Ji, i* = pj) ~ exp - 




(7.111a) 


Since the functional dependence on ft in Eq. 7.111a is of Gaussian form, 
?»,( I m ii r t = Pi) is a Gaussian probability density function. It follows 
immediately that the conditional mean of z l is 

E l z i i m i> r i = Pi] = ~ (Pi • %) = (7.111b) 

L i 


(7.111b) 
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and that the conditional variance is w li (JV’ 0 /2£ , i ). Thus z u is indeed 
identifiable as the conditional mean of z x , given m t and r t = p x . 

A final point is that z u is also the minimum mean-square error estimate of 
a, conditioned upon m t and r, = Pl . This follows immediately from the 
fact that, if y is any random variable with mean y a nd y is an y estimate of 
y, choosing y = y minimizes the mean-square error (y - yf. Indeed, let 
v be any other number, say 

y = y + A. (7.112a) 

Then 

0 / - yf = (y ~ V ~ A ) 2 


= (y - yf - 2A (y - y) + A 2 
= ¥ Tr 0 2 + A ‘ 2 > 


(7.112b) 


which is minimum when A = 0. This is true regardless of the probability 
system over which the expectation is taken. In particular, when the proba- 
bility system is specified by a set of conditions, y is the conditional mean 
and (y — yf is the conditional variance. Thus Eqs. 7.111a and 7:112b 
imply that the mean-square error when *, is estimated by z w given m = m. t 
and r { = is just the conditional variance w w (.K’ 0 /2£ i ). 

Error probability . Although exact calculation of the minimum attain- 
able error probability for an L-diversity fading channel is quite difficult 
even when M =* 2, it is relatively easy to obtain a useful upper bound. 
This is done in the next section. As a preliminary measure we now 
determine an exact expression for the P[8] in the simple case in which 
the L transmission gains all have the same mean-square value; that is, 


/ = 1,2, 


(7.113) 


We assume that there are two equally likely messages, m-i and m 2 , and that 
the corresponding lowpass modulator input waveforms at the transmitter, 
Sl (t) and s 2 (t), are orthogonal and have the same energy, E s . Thus these 
signals may be represented by vectors 

si = V-£ s <Pi 


= 


(7.114) 


In accordance with Eq. 7.106 and its sequel, the optimum decision rule 
is then to set m — m x if and only if 

-f(r 1 -s 1 ) s >fS(r 1 - S3 ) a . (7.115a) 

E s i=i E s *=i 


(7.115a) 
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An equivalent condition is 

2 L 2 L 

2 > I r l2 2 , (7.115b) 

i=i i=i 

where 

r u = r rVi ; i = 1, 2; l = 1, 2, 2L. (7.115c) 

If m = m 2 , then r { = z l s 2 + n 2 and 

t— /= 1,2, ...,2L. (7.116) 

r !2 = Zj V£ s + n n j 

Since n n , n m and z t are statistically independent, zero-mean Gaussian 
random variables with variances 


n 2 — 2 

"ll — "12 — 2 


z i — \af = \b 


(7.117a) 


2 _ bE s 4- J'pQ a 2 j 

r l2. „ — <*2 ] 

1 = 1,2,.. 

■ > 2L. 

(7.117b) 

2 ) 




Ri = ('•ll. r 21» • 

■ ■ > r 2L,l) 


(7.118a) 

R 2 == (r 12 , r 22 , . , 

‘ ‘ 5 r 2L,f)> 


(7.118b) 


it follows that the are statistically independent zero-mean Gaussian 
random variables with variances 


Defining 


we observe from Eq. 7.115b that an error occurs when m = m 2 if and 
only if 

|Ril > |R 2 |. (7.119) 

Letting p x and p 2 denote, respectively, the conditional density functions of 
|Ril and |R 2 |, given m = m 2> we therefore have 

| ™ 2 ] = f p 2 (fi) dp f Pj(a) da. (7.120) 

J — co J p 

Both R x and R 2 are random vectors with 2 L components, each com- 
ponent being a zero-mean, identically distributed, statistically independent 
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Gaussian random variable. The component variance for R x is oq 2 , 
whereas for R 2 it is cr 2 2 . In Appendix 5D we determined the density 
function of the magnitude of a similar TV-component vector with unit- 
variance components: from Eqs. 5D.10a and 5D.14, 


- (,) - N P N ~ ±B N -p»/2 

(2t7) W2 


(7.121a) 


(Nj 2)! 

Scaling to variance a? and simplifying yields 


N even. 


(7.121b) 


ft(«) = 


1 a / a 


(L- 1)! 


i — 1,2. (7.121c) 


Carrying out the integrations of Eq. 7.120 by parts, we obtain first 


j>w* = ^ “ + ■ • • + “T 


and then 

p[8 i msi =t+yT i+ (i)(T?f^ 


cq 2 + cr 2 ‘ 


+ - + (i"9t^?n- (7 - i22a) 


By symmetry, 


P[£ I m 2 ] = P[£ I = P[£J. 


(7.122b) 


The error probability depends on the average received signal energy 
E s = bE s only through the parameters 


A_ _ 0~i 

P ~ a, 1 + 


(7.123a) 


1 „ _ 

P o, + a* 


(7.123b) 


We identify these parameters by observing in Eqs. 7.122 that P[£] = p 
when L = 1. Introducing the values of a x 2 and cr 2 2 given by Eq. 7.117b, 
we have 

n = = l - , (7.124a) 

(bE s + JV 0 )/2 + JNP 0 /2 2 + Ej^ 0 

which checks with the error probability for L — 1 (no diversity) given by 
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Eq. 7.88. In terms of p, Eq. 7.122 is written concisely as e 


m = p L I 

?‘=0 


vV L + 7-1 


(i ~ PY- 


(7.124b) 


Upper bound on P[£]. Although exact, Eq. 7.124 is cumbrous and 
inconvenient to use when L is large. We now obtain an exponentially 
tight upper bound on the attainable error probability with binary orthog- 
onal signals, each having energy E s , which is useful even when the 
E-transmission gains {a ; } do not all have the same mean-square value. 
The result is obtained by means of the Chernoff bounding technique. 

The weighting factors of Eq. 7.105a must .be used by an optimum 
receiver when the 

a? = b l ; l = 1,2, L, (7.125) 

are not all equal. For binary signals with equal energy E s , 

w = Wl = - b i( E Y' ^!o)_ _ EJJC. . J 2- l = 1 2 2 L 

X + btfJW 1 + W ’ 5 ’ ’ 

(7- 126 ) 

where E t — b l E s is the average energy received on the /th transmission. 
Equation 7.106 then states that the optimum receiver sets m = m t if and 
only if 

J I 2L 1 2L 

-.1 ™i(h • s x ) 2 > — I • s 2 ) 2 . (7.127a) 

E s i=i E s i 

An equivalent expression, analogous to Eq. 7.115b, is 


2 L 2 L 

lra*>lr n \ 

i=i i=i 


(7.127b) 


(7.127c) 


in which the are now defined to include the weighting factors (wj: 

hi ^ V w *( r i*9i)' 

A ; l = 1, 2, . . . , 2L. (7.127c) 

hz = V (v t • tp 2 )J 

When m = m 2 , 

tr<Pi = «i i 

= z i Ve s + n l2 , 

which implies 

4 = rnk (7 - 128a) 


rr 2 — r 2 — 

— hi ~ 


(7.128a) 


g. 2 A r 2 _ 4W, A; + JVq E t (-7 178M 

* " i + ejjf, ■ 2 ~1- (7 ' 128b) 

The {r K } are again independent zero-mean Gaussian random variables. 
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In accordance with Eq. 7.127b, an error is made when m = w 2 if and 
only if 

2 L 

2(ra*-r B *)>0. 

1=1 

The probability of this event is 

P[£] - P[8 1 m 2 ] = E f(l(rn - >>2 2 )) , (7-129) 

in which /( ) is the unit-step function of Fig. 7.42 and the expectation 
over the random variables {r w } is conditioned on m = ra 2 . 


Figure 7.42 Unit-step function and exponential overbound. 

We obtain the Chernoff bound on P[£] by again observing that for any 
A > 0 the unit step is overbounded by an exponential: 

/(a) < A > 0. (7.130) 

Thus 

2 L 

P[Sj < E exp A 2 (ra - r lt 2 ) 

z-a 

2 L 

= XI (Ar„ 2 ) exp (-Ar i2 2 ); A > 0. (7.131) 

z=i 



In Eq. 7.131 the statistical independence of the (r w ) has been exploited in 
writing the mean of the product as the product of the means. 

Each of the 4L expectations in Eq. 7.131 can be evaluated by means of 
the lemma of Eq. 7.67, with w = ±A, m x = 0, and ct x 2 — a* or oyv 
We then have 


exp (Ar u 2 ) 


Vl - AA ^ 2 ’ 
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Hence for 0 < A < Ijlo-f 

sl r / & \-'A 

P[S] < n 1 - A - (1 + XE t )~ A 

i=i L\ 1 -I- EJM’J . 

Since E t — E l+L , l = 1,2 , . . . , L, 


l 1 + — 

pfsj < n p — — — 

* =1 L 1 + ^(f [1 + XEi] 


0< A< — + 4 . 

■K E t 


w, 0 (7.132) 

The remaining task is to choose A in such a way that the bound is as 
tight as possible. The value of A that minimizes the /th factor is determined 
as follows: 

£ 1 + *‘(f„-# + ^ )=0 


— £j( 1 + A£j) + E t 1 -(- eJ -^- — a) =0 
L /J 


A = (7 - 133) 

Since this minimizing value is independent of / and falls within the allow- 
able range of A, 

■ = (7 ' 135a) 

the result may be written succinctly as 

PIS] < n 4p,(l - Pi). (7.135b) 

z= i 

Equation 7.135 is the desired bound. We note that for L = 1 it differs 
from the exact expression of Eq. 7.124, P[£] — p u by the factor 4(1 — p t ). 
When the mean-square transmission gains are all equal to b, then 
E l = bE s = E $ for all / and the bound simplifies to 

P[£] < [4p(l - p)] L , (7.136a) 


2 + £JX 0 


(7.136a) 

(7.136b) 



10logjo£ s /.K'o *- 

Figure 7.43 Ratio of P[€} to Chernoff bound for equal strength L-fold diversity, 
M — 2. 

The exact expression of Eq. 7.124 and the bound of Eq. 7.136 may be 
readily compared by rewriting the exact expression as 

P[S] = K(L, EJJT 0 )mi ~ P)] L - (7.136c) 

The parameter K is plotted in Fig. 7.43 as a function of EJJf 0 for several 
values of L. We observe that K is always jess than 0.5 and asymptotically 
approaches a constant, for each L, as EJN 0 becomes large. It is not 
difficult to show that the value of this constant is close to 

[ 2 V ^(1 - 2 p )]" 1 
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Optimum Diversity j 

If the most important parameter in the design of a diversity com- ! 

munication system is the probability of error obtainable with a fixed 
expenditure of energy per message input bit, say E b , Eq. 7.136 implies that | 

there is an optimum choice for the number of diversity transmissions, f 

L. 50 > 67 Assume that the (6 J are all equal and E b is divided equally among 
the transmissions. For L transmissions, the available energy per trans- 
mission is then- I 


E s 


E* 

L ’ 


and the ratio of average received signal energy to noise power density on 
each transmission is 


We then have 


E s bE b jL EJE 


(7.137) 



so that the bound of Eq. 7.136 becomes 



(7.138a) 

(7.138b) 


The factor g(rj ) is plotted in Fig. 7.44 as a function of rj. We observe 
that the maximum value is approximately 0.215, attained for 7] 5 db 3. 
It follows that the probability of error may be made to decrease exponen- 
tially with E b jN 0 by means of diversity,! even though the channel is 
subject to fading. The maximum in Fig. 7.44 is quite broad ; choosing 

(7.139a) 

3^o 

f The decision to use equal energy on each transmission is optimum unless L is so 
large that the resulting value of rj is less than 5 db. Values of L larger than that for 
which r) = 5 db always reduce the constant in the exponent of the bound on P[SJ 
below 0.215. See Example 2, Appendix 7B. 


when L is large. 
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yields the bound 

P[£] < 2“°- 2l5£ ' 6/Xo . (7.139b) 

By way of comparison, for a nonfading. Gaussian channel with unknown 
phase, we have seen in Eq. 7.68 that the use of two equally likely orthogonal 
signals each with received energy E b yields 

P[£] = i e ~ Ebl2Xa ~ £ 2 - 0 - 72E "/ jv \ (7.140) 



Figure 7.44 The “efficiency factor” g(rj) for M = 2, as a function of diversity order 
L and energy-to-noise ratio per diversity transmission. 

Thus fading costs approximately 5.25 db in signal energy. 

The reason why an optimum value of L exists is simple: on the one 
hand, as L increases with the total energy held fixed, the average signal-to- 
noise ratio at the output of a bandpass filter matched to s{t)\Jl cos co 0 t 
decreases and the loss introduced by incoherent reception becomes larger. 
On the other hand, increasing L provides additional diversity and de- 
creases the probability that most of the transmissions are badly faded. 
The optimum value of L reflects the best compromise between these two 
effects. 

7.5 CODING FOR FADING CHANNELS 

Although diversity may be used to obtain a probability of error that 
decreases exponentially as E b j,N" 0 is increased, an arbitrarily small P[S] 
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can be obtained in this way only by making E b fN 0 arbitrarily large. Just 
as in the unfaded case, coding provides a method for making P[8] as small 
as we please' without concomitant increase in transmitted energy per bit, 
provided that £ 6 /J'P 0 exceeds a minimum threshold. That this is so 
follows from random-coding arguments similar to those we have en- 
countered heretofore. 

Unquantized Receiver 

We begin by considering the use of the two specific lowpass sequences 
s 0 (t) and ^(r) shown in Fig. 7.45 as the modulator input set in com- 
municating one of two equally likely messages, m 0 and m u over a Rayleigh- 
fading channel. Each sequence comprises N transmissions (or elements) 
and each element consists of a waveform (or letter) chosen from an 
alphabet containing A orthogonal lowpass waveforms (a^(r)}. We assume 
that each of the A “building-block” waveforms (a:/?)} has energy E s . 



t 

• X\ (0 
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|*2 (t) > 

1*3 (0 ) 
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Figure 7.45 Composition of two (iV = 5)-element signals : (a) signals; (b) letters. 
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Thus *X0 = V^s <pit\ j — 1, 2, . . . , A, and the {«#)} can be repre- 
sented by a set of A orthogonal vectors 

We also assume that the fades affecting successive elements are statisti- 
cally independent and of equal variance, a? = b for / — 1,2 , ,N. 
Thus the receiver for the signaling sequences s 0 (t) and si(0 is very much like 
the receiver for an /V-diversity system. The difference is that for /^-diver- 
sity the same letter would be transmitted during each element, whereas 
now successive elements in general consist of different letters. The 
receiver must reflect this difference. In particular, setting E t = E s in 
Eq. 7.105 and observing that the (cj and {w u } are independent of i and /, 
we have 

2 iV 1 

Pr(P | m i) ~ 2 7T (Pi ‘ s i ) 2 ’ i = 0, 1, ' (7.141a) 

i-i E s 

in which, if the /th element of s t (/) is the letter x } (t), 

(7.141b) 

The optimum receiver may be implemented as a bank of bandpass 
matched filters (one for each of the A orthonormal waveforms {<^(0}) 
followed by envelope detectors, squarers, sampling gates, and com- 
binatorial logic. Such a receiver is shown in Fig. 7.46. At the end of the 
transmission of each element, the receiver samples and stores the output 
of all A squaring circuits. After the N elements have been transmitted, the 
receiver has NA samples stored in memory. From these it selects the 
{(Pi * Vi) 2 } requisite for calculating the two sums required by Eq. 7.141. 
(The composition of each signal *„(*) and s x (t) is presumed to be known.) 
The receiver then determines m in accordance with the larger of the sums. 

In the two-message communication system that we have just described 
the error probability is easy to calculate. Assume that s 0 (t ) and Ji(0 
involve identical building-block waveforms (letters) in h of their N 
elements. (For the sequences of Fig. 7.45, h = 2.) Clearly, the h overlaps 
contribute nothing to message distinguishability. Over each of the remain- 
ing N - h elements, however, the signals are orthogonal. This follows 
from the fact that the {*,(0} are mutually orthogonal. Thus the proba- 
bility of error, say P 2 [S | h\, is just the probability of error for diversity 
transmission with L = N — h. From Eq. 7.136, 

P 2 [£ | h] < [4p'(l - p)] N ~ h (7.142a) 

= 1 
P “ 2 + EJN o ‘ 


with 


(7.142b) 
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Sample at . 
t= ti+T 
1 = 1, 2 N 

hj(t) = ipj(T - t) ->/2" cos wo t; j- 1, 2 

Figure 7.46 Receiver implementation for coded diversity transmission, M = 2. For 
the signals of Fig. 7.45, 

Z 0 = + XU + * 3 \ + XU + XU 

Zi = XU + XU + XI + XU + XU 

The squared envelope samples from elements number 3 and 5 make no contribution 
to the receiver decision. 


Next consider an ensemble of two-message communication systems in 
which every possible signaling sequence appears with equal probability. 
There are A N distinct, signaling sequences, or codewords, that can be 
constructed by using one of A letters in each of N elements. Thus the 
probability of the subset of systems which use codewords overlapping in 
h positions is 


P [h] - 


N\ /I r 


— Ml — — 


Over the ensemble the mean probability of error is therefore 
P^g] A 2 Pj>] P 2 [g I h] 


(7.143) 


<i( N h 


N\ (1 \ h 


- 1-- 4p(l-p) 


ri . a — i . ' 

= - + — — 4p( 1 - p) 
LA A 


_ 2~ n l0 Ea (J ]_44) 

The final step in a random-coding argument, as usual, is to extend 
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consideration to the case in which there are M = 2 K messages. We 
again envision an ensemble of communication ■ systems in which each' 
distinct code consisting of M codewords chosen from the Al jV -member 
code-base set appears with equal probability. As in Chapter 5, the union 

argument yields 

P[S]<MP 2 [S], 


which can be writtenf as 


P[g] < 2 _lV[Ro " i? Ni 


(7.145a) 


R 0 =4 logJ 

LI + 4 (A - l)i*l - p). 

M = 2 k 4 2 ;VB n ? 

= 1 

P 2 + EJX o ' 


(7.145b) 

(7.145c) 

(7.145d) 


It is instructive to rewrite Eq. 7.145a in a form that places the de- 
pendence of P[£] on the number of bits coded together, K, and the average 
received energy per bit, E b , directly in evidence. Let 


L= — = — 
K R u 


(7.146a) 


denote the number of elements (diversity) per bit, and (as in Eq. 7.137) 
define 

a E h ■ E. 


Then we again have 


A _ Jh 
LX o 


(7.146b) 


(7.147a) 


?(1 P) (2 + rjf 
With this notation, Eq. 7.145a becomes 


(7.147b) 


P[S] < exp{— if0og.2)[Ag J (,)- l]J, (7.148a) 

f Note that in Eq. 7.145a has units of bits per element, not bits per dimension; 
each of the N elements requires A dimensions, since we anticipate that each of the 

{a:,.(/)} will appear as the /th element, / = 1, 2, , N, in at least one of the M 

codewords. 


in which 
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w 


sM - n 1082 i + 4(A-im + v)K 2 + n n ■ (7 - 148b > 

For any rj the function gjyj) increases monotonically with increasing A. 
(The probability that any two code-word sequences overlap in h positions 
decreases as the alphabet size A becomes larger.) Furthermore, as 
A -> co the function g A {rj) becomes the function g(rj) plotted in Fig. 7.44. 
Thus when A can be made arbitrarily large, the optimum choice of L is 
that value for which rj pm 5 db. Finally, an easy calculation shows that for 
this choice of L any value of A > 18 yields an exponent that is at least 
0.935 times as large as the exponent for A = co. Thus for rj 5 db and 
A & 18 we have 

W] < (7.149) 


As long as the mean received energy-to-noise ratio per bit F 6 /Jf 0 is greater 
than approximately 7 db, the probability of error decreases exponentially 
with the code constraint length K. Furthermore, this can be accomplished 
with a set of building-block signals {*,(*)} comprising only 18 orthogonal 
waveforms. If we restrict A to 2 (binary orthogonal signaling), the 
optimum value of rj changes only slightly to approximately 2.7. The 
effect on Eq. 7.148, however, is that E b /X 0 must now be slightly greater 
than 10 db for the bound to converge to zero with increasing K. 

As with the Gaussian channel, the simple union argument used to 
derive Eq. 7.149 does not lead to the tightest possible bound for all values 
of EJX 0 . Indeed, it can be shown 49 by more sensitive analytical tech- 
niques that the capacity of an infinite-bandwidth Rayleigh-fading channel 
is the same as that of an infinite-bandwidth Gaussian channel with the 


same mean energy-to-noise ratio. The bound of Eq. 7.149, however, is 
exponentially tight for values of EJX 0 somewhat greater than 7 db. In 
addition, 7 db approximates the minimum value of E b jX 0 for which the 
mean number of computations required by the sequential decoding pro- 
cedures of Chapter 6 converges. 


Binary Quantization 

With pure additive white Gaussian noise, we found it necessary [see 
Chapter 6] to quantize the matched-filter outputs to implement a feasible 
decoder. Similarly, with a Rayleigh-fading channel we must also reduce 
the decoder input to discrete form. The probability of error bounds of 
Eq. 7.145 and 7.149 are meaningful in an engineering sense only when the 
squared-envelope samples resulting from the transmission of each element 
are quantized with sufficiently fine grain. In this section we illustrate the 


556 IMPORTANT CHANNEL MODELS 

degradation that results when the quantization grain is coarse. In par- 
ticular, we consider the case A = 2, so that the building-block alphabet 
consists of only two letters, which we may take to be the orthogonal low- 
pass waveforms x x (t) = \/jE s <p x {f) and x 2 (t) = \Je s 
In the transmission of the /th element the optimum receiver observes the 
two squared-envelope samples, say X a 2 and X l2 2 , produced at the output 
of bandpass filters matched to cp^t) V 2 cos w 0 t and <p 2 (t) V 2 cos a> 0 t, ‘ 
Quantizer output 



Figure 7.47 Symmetric binary quantization of the difference of squared envelope 
samples. 


respectively. It is clear that symmetric binary quantization (see Fig. 7.47) 
of the difference of these two samples X n 2 — X n 2 corresponds to making 
an optimum binary decision about which letter was actually transmitted 
as the /th element, under the assumption that both letters are equally 
likely. If successive received elements are quantized in this same way, 
/ = 1, 2, . . . , N, the channel is converted into a binary symmetric channel 


with transition probability 


1 

2 + EJX 0 


(7.150a) 


equal to the error probability for a single binary decision (Eq. 7.88). 

The transmission of N elements in succession corresponds to N uses of 
the BSC, so that from Eq. 6.70 


P[S] < 2" 2V[W-r nI j 


(7.150b) 


7? 0 ' = 1 - log 2 [1 + V4/>(1 -/>)]. 


(7.150c) 


The error exponent R 0 ' is. plotted as a function of EJN 0 in Fig. 7.48, 
together with the corresponding unquantized exponent 


*0=1“ log 2 [1 + 4/7(1 - p)), 
which results from specializing Eq. 7.145b to A = 2. 



(7.151) 
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Average energy ratio. 10 fog 10 E s /W 0 


Figure 7.48 R 0 and Ji 0 ' for binary orthogonal signaling on a Rayleigh-fading channel 
with two- and three-level symmetric quantization. 

Null-Zone Quantization 

The performance degradation entailed by binary quantization can be 
reduced by using a three-level quantizer, as shown in Fig. 7.49. If, for 
any element l = 1, 2 N, the input to the quantizer is the difference 

. . V - ( x n ~ Xif), (7.152a) 

the output is 

'+1; y>j 

z = 0; —j <y <j (7.152b) 

-i; y<-J. 

Thus the channel is converted into a null-zone channel (cf. Fig. 6.23) with 
transition probabilities q t >v, and 1 — q — w, which are functions of the 
threshold J. From Eq. 6.73 the resulting error exponent is 

*q / = 1 — logs [1 + w + 2 y/q(l — q — w)j. 


(7.153) 
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We choose J in such a way that R 0 ' is maximum. The first step is to 
determine the probability density function, say p y ( | x 2 ), of the quantizer 
input that results when the transmitted letter is Clearly, 

Pv(y | xO = p v (~y | x 2 ). (7.154) 

Quantizer output 


Quantizer input 
(y = X l 2-X l 2*) 


M 

1 -q-w 

+ 1 


0 


-1 

(b) 

Figure 7.49 Symmetric three-level quantization and the discrete channel that results 
therefrom. 

The density function p y { | x 2 ) may be found from its characteristic 
function. For the lib. diversity element we have 

y = (4 + 4+t.i) - (4 + r\ +l>2 ), (7.155a) 

in which each squared term is a statistically independent zero-mean 
Gaussian random variable with variance (from Eq. 7.117b) 


x 2 transmitted. (7.155b) 
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Thus for each l the characteristic function of y conditional on x 2 is 


M v {v) 4 e ivr n e ivr ln,i e~ ivr % e~ ]vr l+i,* 


[1 ~ JVJW 0 ][1 + j \v(E s 4- ^„)] 


(7.156) 


In the last line we have again invoked the lemma of Eq. 7 67 this time 

with = ±J. Expanding the right-hand side of Eq. 7.156 in a partial 
fraction yields ^ 

M y {v) = 1 + n 

1 - jvJT 0 1 + j v JT 0 (l - p )/ p 5 (7.157a) 

where, as usual 

r = ■ < 7 - 157b ) 
The inverse Fourier transform of M y (v) is 


Pv(Y j x 2 ) = 


4 exp (-iU ; 


exp (-^ £ — 

WY’o W 0 1 ~p 


(7.158) 


y < o, 


a result that may readily be verified by using Eq. 7.158 to recalculate 
M y (v). 

^ transition Probabilities for the null-zone channel 
created by a three-level quantizer with threshold J are 


— no ~J/N, 


q I Xz) dy * Pe 


(7.159a) 


w = | Pviy | x 2 ) dy 

= p{ 1 - + (1 - p )fi _ exp /_ _p_ 

L \ jc 9 1 _ 

= 1 - pe- J W o - (1 - P ) expf- — — ±—\ 

' d'f’o 1 — pi 


(7.159b) 


Our objective is to maximize the value assumed by R 0 ' when q and w are 
substituted m Eq. 7.153. We therefore wish to minimize the quantity 

w + 2 y/q(l — q — w ). 
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It is readily verified by differentiation that the optimum value of J is 
given by 



1-2 p V 


4 + fM 1+ S)- (7 - 160) 

The error exponent M 0 ' for the null-zone channel obtained with the 
optimum threshold setting is also plotted in Fig. 7.48 as a function of 
E /JYV We note that the curve falls between the unquantized and binary- 
quantized error exponents, as it should. It is also apparent that K 0 
cannot be substantially increased by using more than three quantization 
levels when the alphabet size A is restricted to 2. This does not mean, 
however, that the problem of effectively reducing the relevant received 
data to discrete form is always trivial. For alphabet sizes A larger than 2, 
strategies more subtle than the independent quantization of the squared- 
envelope sample for each letter are indicated (cf. Problem 6.10). 

Discussion 

The objective in alt types of diversity communication is to obtain L 
received signals for which the transmission gains {<**} are statistically in- 
dependent. If this condition is to be met, it is essential that the phase 


Two Scatterers 



Transmit * ^ 


Figure 7.50 Space diversity. 

interference pattern of the wavelets from individual scatterers be distinctly 

different for each of the L received signals. 

One way to accomplish this with only a single transmitted signal is 
called space diversity. Consider the situation illustrated in Fig. 7.50 and 
assume that the transmitter, the two scatterers, and the first receiver site 
are held fixed in space. Thus the phase difference between the two 
received wavelets at site 1 is fixed. Now consider moving site 2 away from 
site 1. The phase at site 2 depends on its position and rotates through 277- 
radians as the difference in path length from transmitter to scatterers to 
site 2 changes by an amount equal to the wavelength of the RF carrier. 
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When the two sites are far removed from each other, small percentage 
changes in the geometry introduce radical changes in the difference 
between the received phase at sites 1 and 2. If myriad scatterers are moving 
at random through the scattering volume intersected by the antenna 
beams, it is usually reasonable to assume that the transmission gains to the 
two sites will be statistically independent whenever the separation between 
sites is many carrier wavelengths, /.-diversity may be obtained by using 
L dispersed receivers. 

A similar rationale justifies an assumption of statistical independence 
when frequency diversity is used. In this technique there is only one 
receiver, but the lowpass signal simultaneously modulates L different 
carriers, each having a different frequency. The difference in path length 
for two stationary scatterers is now fixed in number of meters but varies 
in number of wavelengths as a function of carrier frequency. With many 
moving scatterers, good diversity usually results whenever the greatest 
difference in path length within the scattering volume implies a phase 
difference greater than 2t t between adjacent received sidebands. 

The objective with space diversity but no signal diversity is to provide 
enough receivers sufficiently separated that a tolerable error probability is 
obtained for a given mean received energy-to-noise ratio per bit, EJN , 0 . 
Additional design freedom is introduced when signal diversity is con- 
sidered. This may be accomplished by time or frequency diversity, a 
combination of the two, or by other means. 49 In accordance with Eq. 
7.139a, the objective is to obtain E b jLN 0 3. The attainable error 
probability is then bounded by Eq. 7.139b if no coding is used and by 
Eq. 7.149 if there is orthogonal-letter coding with A > 18. 
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The filter G(f) in Eq. 7.7b was determined by a general method which 
always guarantees that both G(f ) and G~ l {f) are physically realizable 
whenever S n (f) can be written as a ratio of two polynomials in /: f 


o , n N(f) , (/ - Q(/ -&) — (/- £n) 

” (/ D(f) 


(7A.1) 


In Eq. 7A.1 is the set of complex roots of the numerator polynomial 
N(f), assumed to have degree n, and {%} is the set for the denominator 
polynomial D(f), assumed to have degree d. The set {£,-} is the zeros, and 
{Vi} the poles, of §„(/)• Since S n (f) = .§ n (—f) = S n *(f) (a power 


f If S„( f) is not rational, it may still be possible to approximate it sufficiently closely 
by a rational function. 
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spectrum is real and even), both zeros and poles are symmetrically located 
about the real and imaginary axis, as shown in Fig. 7A.1. It is convenient 
to number the roots so that odd-numbered roots have positive imaginary _ 
parts and even-numbered roots have negative imaginary parts. For ex- 
ample, in the power spectrum of Eq. 7.7a, 

N(J) =/ 2 + 4 = (/- J2 )(/+ ]2); Ci = \X £2 = -P 
D(j)=p+ i =(/-]!)(/+ Ji); vi = iUih = -ji- 


i/3 



The whitening filter, G(f), may now be easily specified. Define 

Naif) = (/- m~ W • ' ■ (f~ Zn-il 

N L (f) = if- Q(f- U)"’ (f~ U, 

D v {f ) = (/- m)(f~ %)-••(/- 

ZW) = (/- %)(/- ■ (f- %), 

and set „ , 


<?(/) = 


iV a (/) 


Thus the numerator of G(f ) incorporates the upper half-plane 
S n (f) and the denominator incorporates the upper half-plane 
§„(/). Clearly, = Du(f) D p(f) 

' NuWNv'V) 

_ Du(f) Pdf) 

N v {f) N L {f ) 

_D(f) 

N(f ) 


(7A.2) 
(7 A. 3) 
(7A.4) 
(7 A. 5) 

(7A.6) 

poles of 
zeros of 


(7A.7) 
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so that k is identified as the power density JV° 0 /2 at the output of the 
whitening filter. 

For the spectrum of Eq. 7.7a, 


G(/) = 


/- jl _ j/+l 
/-j 2 j /+2 


It remains to be shown that both filters G(f ) and G -1 (/) = N v (f)!D v {f) 
are realizable. Consider a filter transfer function of the form 

n/2 

IK/- hi-i) 

*=i 

d/2 

IK/- »?2i-l) 

i-1 

Letting s = \l7rf, we may express the filter characteristic in terms of the 
complex frequency variable 5 as 


,a wiar - u _ b/ - ^ 

Tim* - Vi) JJJs - \2ttvJ 


If Vi = a + \b has positive imaginary part, the 5 -plane pole j2v rj? t - — 
—277-6 -F j27r« has negative real part and lies in the left-half 5 -plane. All 
denominator roots of both G(f) and G~ l (J) satisfy this condition of 
positive imaginary part. Thus all the poles of both filters fall in the left- 
half 5 -plane, hence both are realizable. Filtering with G(f) is therefore a 
reversible operation. 

Roots of S n (/) on the real axis are always of even multiplicity; the 
situation may be handled mathematically by assigning consecutive indices 
to any such root. Of course, these roots correspond to lossless resonances 
and do not occur in practice. 

A more substantive issue is that any §„(/) encountered in practice 
goes to zero as f—> co. Because all physical circuits ultimately become 
capacitive at high frequencies, ideal whitening is impossible. The difficulty 
is resolved by recognizing that G(f ) need only “whiten” the noise over 
the frequency band containing most of the transmitted signal energy. 


APPENDIX 7B CONVEXITY 

A function / is said to be “convex” over [0, 00 ] if its second derivative 
satisfies 

f"(x) < 0; all a; > 0. (7B.la) 
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The function / is “concave” over [0, co] if 

f"(x) > 0; all x > 0. (7B.lb) 

If the equality sign in Eq. 7B.la (7B.lb) is not permitted, / is “strictly 
convex” (or “strictly concave”). A convex function over [0, co] is 
pictured in Fig. 7B.la and a strictly convex function over [0, oo] in 
Fig. 7B.16. 


AW 



lb) 

Figure 7B.1 A convex and a strictly convex function over [0, co]. 
Theorem. Let {xf, i = 1,2, . . . , N, be a set of real numbers subject to 


the constraints 

Xi > 0, i= 1,2, ... ,N 

(7B.2a) 

and 

N 



2 x i = K > 

• (7B.2b) 

and let f be defined as 

2=1 

1 N 



/=^2/W 

Ni^i 

(7B.3) 


If f is convex over [0, co], then 

(7B.4a) 

N i = 1 

/> |/(K). (7B.4b) 

in which 

t N K 

* = j~ r 2xi = ~- (7B.4c) 

iV;=i . N 

Whenever /is strictly convex, the equal sign holds in Eq. 7B.4a if and 
only if — x,i= 1,2 , . . . , N. 



If /is concave, —/is convex, and the statements of the theorem apply 
with the inequalities reversed. We henceforth restrict our attention to 
convex functions. 

Proof. Since the function /must be continuous in order for f" to exist 
and each of the is restricted to the closed .interval [0, K], 

1 N 

/=/2/0V 

must take on a maximum and a minimum as the (sj are varied. Let/ be 
strictly convex. We now prove by a geometrical argument that / is 
maximum when x t = x for all i. For assume that some set {.tJ not 
satisfying x i = x, all z, produces the maximum. At least one pair of the 
{••cj must be unequal, say x x ^ x z . However, it is clear from Fig. 7B.2 that 
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replacing both ^ and * 2 by (x, + x^/2 increases / without changing 
Xi k. Hence the assumption that any set of {a?/ not satisfying 
= x, all i, produces the maximum leads to a contradiction and the 
proof of Eq. 7B.4a is complete. __ 

A similar argument proves that the set {x l - K, x 2 — x 3 — • • • — 
* v = 0} minimizes /, which verifies Eq. 7B.4b: it is necessary only to 
observe as in Fig. 7B.3 that if aq = a, x 2 = b, 0 < a < b, then / is de- 
creased by setting x x = 0 and x 2 — a + b. 


il f(a) + f(b)\ 


2 1/(0) + f(a+b)\ 



Figure 7B.3 Decrease in /possible when at least two of the R} are nonzero. 

Whenever / is convex, but not strictly so, the possibility exists that 
fix) will fall on a straight-line segment of/. If so, any choice of the {*<}, 
with sum K, such that each /(*,-), i = also lies on the same 

straight-line segment, will achieve the same maximum. 

Extension of Eq. 7B.4a to a positive random variable involves little 
more than a change in notation: in particular, we now place a con- 
straint on E[x] rather than on 2 ®<* For mstance ’ let * * be a random 
variable such that 


p/a) da - 1, 


J a P®( a ) = “ x ’ 


(7B.5a) 


(7B.5b) 


in which S is a specified positive number. Observe that when/is convex. 


/(a) </<*) + (a - «)/'( x ); a > °- 
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Thus 

W) < m + &=*)!'<& = m- (7B.6) 

If / is strictly convex, the equality will hold if and only if 

p x {a) - <5(a - x). (7B.7) 

Example 1 . In Eq. 7.38b, 

P[S] = Q(aj2EjX 0 ), 


Q (a) 



where a is a positive random variable. Since the function 
f{x) 4 Q(xj2Ej3f o y, * > 0 
is strictly concave, as shown in Fig. 7B.4, we have 

P[S] > Qiasj2EjX~\ (7B.8) 

with equality if and only if a = a with probability one. 

Extension to Nonconvex Functions 

We now extend Eq. 7B.6 to nonconvex functions. Consider the function 
/ defined in Fig. 7B.5a over [0, co], We can construct from / a unique 
convex function/* such that /</* for all x in [0, co]. We do so by 
starting at the origin and following / to the first point, say a, at which a 


/ 




Figure 7B.5 Construction of a convex overbound to /(a;). 


line tangent to / at a is also tangent to / for some x> a, say b. The 
straight line connecting/^) and fib) is made part of/ * and the process is 
continued. This process is made clear in Fig. 7B.56. 

The only exception to this construction of/* occurs when a line 
tangent to/ at the origin falls below / at some positive argument. In this 
case the first segment of/* is a straight line passing through the origin and 
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tangent to /at some point, say e. If several points of tangency are possible, 
the one yielding maximum slope is chosen, as shown in Fig. 7B.5c. 

Since /</*, and /* is convex, we have 

TV) < 7 * 0*0 </*(*); (7B.9) 

the second inequality follows from Eq. 7B.6. 

We often wish to find a p x , with specified mean x, such that fix ) is 
maximized. We now show that there is always a density p x with mean x 
such that f{x ) actually equals f*(x). There are two situations. First, if x 
is such that fix) = /*(»), the choice pja.) = <5(a — x) produces the 
desired maximum. Second, if x is such that fife) ^ / *(£), then x lies in an 
interval over which / * is a straight-line segment. As noted earlier, we can 
distribute x over the straight-line segment without changing the value 
of / *(x). In particular, we can place the probability in two impulses, one 
at each of the end points, say a and b, of the straight line. In this case 

- b) 

and 

/*(*) = «/*(«) + (1 - «)/*(&) 

= a/(a) + (1 - «)/0) 

where a is chosen to satisfy the constraint equation 
x = aa -j~ (1 — a )b. 


Example 2. 


where 


In Eq. 7.134 we have 

pis] < n [4 v t 

L (2 + E t jW o) 2 J 


y f± _ £b_ 

wJf 0 Wf 


(7B.10a) 

(7B.10b) 


By varying the {£)} subject to Eq. 7B.10b, we wish to minimize the bound 
of Eq. 7B.10a or, equivalently, to maximize 

(7B.lla) 

in which 


and 


fix) 4 In 


(2 + *? 

4(1 + x) 


(7B.llb) 


A_ Ei 

~X 0 


(7B.llc) 
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The function f(x) is plotted in Fig. 7B.6, together with the convex 
function / *( 2 ). If * (which corresponds to the over-all average received 
energy-to-noise ratio per diversity path, E b jLJP 0 , is greater than f=a3, the 
minimum bound occurs when all x t = x. If $ is less than ph 3, the mini- 
mum occurs when several of the x t are set equal to zero and the remainder 
are set approximately equal to 3. In the latter case several of the available 
diversity paths are not used. 



Figure 7B.6 Graph for minimizing P[Sl in L-diversity signaling. 


APPENDIX 7C LEMMA 

In this appendix we prove the following lemma : 

Lemma. If x is a Gaussian random variable with mean m and variance 
o -2 and w is any complex constant with real part less than (2cr 2 )“ 1 3 then 

~P = , 1 ■■■ ; Re (w) < . (7C.1) 

V 1 — 2wa 2ct 



Proof. By definition, 
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4 W d(/ 

\]2tT 0 J-CO 


(7C.2) 


After completing the square in the exponent, this equation becomes 




~lx = — exp 

-CO sJ2lT o 


1 — 2w<7 2 
2(7 2 


1 — 2 wo 2 


Defining 


we have 


A / 

= la — 


1 -- 2 wo 


V 1 — 2 wo 2 


v/l — 2wo 2 


. 2 /(1-2m>< 7 2 ) f 1 -0 2 


e-t'Up, 


(7C.3) 


(7C.4) 


(7C.5) 


in which F is the path traversed by the complex / 

variable as the real variable a varies from y' 

-ooto+co. 2cr2y 

The lemma is proved by showing that the inte- / 

gral in Eq. 7C.5 is unity whenever Re(w’) < (2c 2 )- 1 . <t> 

The first step is to determine V. Let , 0 2 

r 1-2 a* u 

W = u + \v (7C.6) Figure 7C.1 Polar trans- 

and define formation for evaluating 

ce-i* 4 l _ 2wo 2 = (1 ~ 2 o 2 u) - ]2<rV (7C.7) mtegra1. 

Thus c and <f> are given by the polar transformation of Fig. 7C.1. From 
Eqs. 7C.4 and 7C.7 


m : 

= a e‘ 

c 




— yJL ae -w 2 _ HL e +)4>i2 


m \ _ 


It follows that 


m \ $ 


V 1 1,1 \ Y .1 . rn i . 

= — la 1 cos — — jl a 4 1 sin 


Re(/S) = 0 for a = + - , 
c 


(7C.8) 


Im(/?) = 0 for a = 

c 






Re jS 


Figure 7C.2 Path of integration. 

The locus T is therefore the infinite straight line of slope —tan ^<f> shown 
in Fig. 7C.2. 

Since the function e~ fi2/2 has no finite poles, the integral around the 
closed contour in Fig. 7C.3 is zero and 


f = lim f + ( + I 

J A-y co J J J 


(7C.9) 


r L r i r 3 J 

We assume first that A is finite and then take the limit as A -> co. Since 

lim f-J= f*' 1 * d/3 =[" 4= <r f ‘ /s dfi = 1. (7C.10) 

A~*o o J -tJ2-TT J- oo \J2tT 

r 2 

lm 0 
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it follows that the integral of Eq. 7C.5 is also equal to one whenever 


lim 

A~>Ki 


+ 

ih r 3 


= 0. 


Now consider T 3 . For all points along this path (3 = A — jy, 
0 < y < A tan Thus 


j /'A tan <t >/ 2 


^2i 


,-U-iy>72/_ 


(-j rfy). 


(7C.11) 


Taking the absolute value of both sides of Eq. 7C.1 1 yields 

<*A tan <t>l 2 
10 


1 

V 27r 


_ y -j2Ay)/2(_0 dy 

f' A tan <£/2 

\-j e -U-y->2Ay),2\d y 




, ‘A tan 0/2 

i A*/2 I „+y*/t 


It is clear that 


< { A tan - j exp 
2 


l *dy 

4 ! 

2 


lim 

A-*co 


J = 0; for t; 

r 3 1 

An equivalent condition for the vanishing of J is that 


-*“■£)]■ 

(7C.12) 

!<*. 

(7C.13) 

is that 



<“ or 

4 2 


(7C.14) 


We see from Fig. 7C.1 that the condition is met whenever 1 — 2ahi > 0 
or 


u = Re(w) < — . 

2<7 2 


(7C.15) 


An identical argument applied to J shows that this integral also 

vanishes in the limit A co whenever Eq. 7C.15 is satisfied, which com- 
pletes the proof of the lemma. 
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PROBLEMS 

7.1 A communication system uses the { 9 /f)} shown in Fig. P7.1(i) to transmit 
one of two equally likely messages by means of the signals s^t) - ' E s 9»i(0, 
— V cp 2 (t). The channel is illustrated in Fig. P7.1(ii). 


<Pi (t) **0 



(i) 



h(t) 



Figure P7.1 

a. What is the probability of error for an optimum receiver when EjX 0 = 5? 

b. Give a detailed block diagram, including waveshapes in the absence 
noise, of the optimum filtered-signal receiver. 


7.2 Let white noise with 8 w (/) = 1 be passed through a filter with transfer 
function 


H(s) = 


- 5s 2 + US - 15 . 
7+5 s 3 + S5 2 + 6s ’ 


5 — Cf + \2rrf. 


a. Determine the power density function s „(/) of the noise n{i) at the filter 

““^Determine the transfer function G(/) and the impulse response g(t) of the 
whitening filter with realizable inverse. 
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c. Discuss how G{f) might be modified in the design of a receiver for signals, 
with lowpass bandwidth W = 1, corrupted by the addition of «(/), 

7.3 Consider an additive Gaussian noise channel, that is r{t) — s(t ) + n(t), 
where §«(/) is not white. Assume the transmission of one of M equally likely 
signals, each of which is identically zero for |?| > T/2. 

a. Show that the optimum receiver must observe the entire received waveform 
r(t), — co < t < co. 

b. Let 


and let M = 2, with 


s( n_^o / 2 + l 
n(f> T / 2 + 4 ’ 


±V£ # ; 


1*1 < i 

elsewhere. 


Exercise engineering judgment to determine a small finite interval to which 
observation of r(t) may be restricted without incurring substantial performance 
degradation. 

7.4 A random process n(t) is defined as 

n(t) - n c (t) V 2 cos co 0 t + n g (t) V 2 sin (o 0 t, 

in which n c {t) and n s (t) are zero-mean, jointly wide-sense stationary random 
processes. In terms of 

= n c (t)n c (t - r), 

= n s (t)n s (t - t), 
a m( t ) = n c (t)n s (t - r), 

&sc( T ) = n s (t)n c (i - r), 

determine sufficient conditions for n(t) to be wide-sense stationary. 

7.5 Two lowpass signals s t (t) and s 2 (t) of bandwidth W are DSB-SC modulated 
on quadrature high-frequency carriers, and the modulated carriers are added 
before transmission. Thus the transmitted signal is 

s°(t) = Sj(f)V 2 cos 2 77 f 0 t + s 2 (t) V 2 sin 2-nf 0 t. . 

The channel is shown in Fig. P7.5, where h(t) is the impulse response of a linear 
time-invariant filter with transfer function H(f). Prove that the condition 

Wo +/) “ WU -/); o </< w 

ensures that the lowpass signals regained by quadrature DSB-SC demodulation 
do not contain crosstalk (i.e., energy from one signal in the output of the 
demodulator for the other). 
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Figure P7.5 


7.6 Because ideal rectangular filters are unrealizable, SSB is not possible in 
practice when the baseband signals (of bandwidth W) contain significant energy 
in the immediate vicinity of / = 0. The difficulty is often evaded by use of 
“vestigial sideband” modulation. Show that the idealized vestigial sideband 
system diagrammed in the Fig. P7.6 also yields error performance equal to that 
of DSB-SC when H(f) is appropriately chosen and n w (t) is Gaussian. What 
conditions must H(f) satisfy over [/„ - A,/ 0 + A]? 


2 cos wo t n w (t) 2 cos wo* 



Figure P7.6 


7.7 When s(t) is transmitted over a particular fading channel, the received 
signal is 

r(i) = as(t ) + njf), 

where 

p a (o) = .01 3(a) + 0.09 <3(cc - i) + 0.9 <5(« - 2) 

and n w (t) is white Gaussian noise with mean power density J'P 0 /2. One of two 
equally likely messages is transmitted by means of antipodal signals of energy E s . 

a. What is the average energy of the signal component of /•(?)? 

b. Determine P[S] as a function of EJX 0 . 

c. What value does P[S] approach as EJN 0 becomes very large? 

7.8 After observing the received waveform, the optimum receiver for a binary 
communication system calculates a decision variable, say y, and sets m = m 0 if 


PROBLEMS 


y > 0 and m — m 1 if y < 0. Assume that the signaling is symmetric in the sense 
that 

p y (o | m 0 ) = p y ( — a | m t ); all a. 

Show in consequence that 

P[g] = f(y) < for all 2, — oo < X < co, 

where f is the unit step function and the expectation is conditioned on m x . 
Hints. Consider the necessity of the condition 

p y ( — a I Dij) > p y ( a I Wj) ; for all a > 0, 

and note that for any probability density function p( a) 


p(<x)e Aa do. = [p( — a) — p(<y)]e~ Xa do + 




7.9 One of M equally likely messages is communicated over a random-phase, 
additive white Gaussian noise channel by means of M orthogonal signals, each 
of energy E s . The noise power density is N 0 /2. 

a. Draw a block diagram of the optimum receiver. 

b. Show that when the channel phase 6 is equal to y the conditional error 
probability may be written as 

P[g | 6 = y] « 1 - [I _ 

Here x and y are statistically independent Gaussian random variables with unit 
variance and means equal to V 2 E s /N 0 cos y and V 2 EJN 0 sin y, respectively; 
the expectation is with respect to x and y, conditioned on 6 — y. 

c. Using the lemma of Eq. 7.67, show that 

' T E s i 1 


HpHm - 1 


(-D i+1 


exp L“ 0 rrT 


which is a generalization of the result of Eq. 7.68 for the case M ~ 2. 

7. 10 Consider signaling over a bandpass, known-phase, additive white Gaussian 
noise channel with codewords that are sequences chosen from an ^-letter alphabet 
of orthogonal waveforms with equal energy. Assume that the letters are chosen 
with equal probability and statistical independence. Show that the resulting 
value of the (unquantized) exponential bound parameter R 0 is the same for 
either a coherent or an incoherent receiver. Would this be true if the letters 
formed an -waveform simplex? (Assume that the channel phase is constant.) 

7.11 A random phase, nonfading, additive white Gaussian noise channel is 
converted to a discrete channel by the following modulation/demodulation 
scheme. Each use of the discrete channel corresponds to the transmission of a 
waveform chosen from a four-letter alphabet of orthogonal waveforms, each 
with energy E s . The channel phase is constant over each use of the channel, 
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uniformly distributed, and statistically independent from use to use. At the 
receiver maximum-likelihood (incoherent) detection is performed on each letter. 
The detector outputs are then passed through a “quantizer” whose output is an 
ordered pair of integers, the first indicating the letter whose detector output 
(likelihood) is largest and the second indicating the letter whose detector output 
is second largest. (See also Problem 6.10.) 

a. Verify that the following transition probability matrix describes the discrete 

channel. 

OUTPUT 



b. Express Rf for the discrete channel in terms of q, w, and p. 

c. Show that 

q = HI - e-^Wf 
W = [1 - e -^v*)l2f[ e -^v->l*} 

p = (1 -3q - 3m>)/6, 

where x and y are statistically independent unit-variance Gaussian random 

variables with x = V2 EjJC 9 , y = 0. 

d. Use the lemma of Eq. 7.67 to evaluate q and w in terms ot 0 - 

7.12 A communication system operating on the random phase, additive white 
Gaussian noise channel of Problem 7.11 u_ses codewords constructed from the 
two orthogonal waveforms Ve s <p 0 {t), Ve s *(f). The receiver utilizes a maxi- 
mum-likelihood detector followed by a three-level quantizer. Whenever both 

JVL 

(r t -<Po) s + ('.-'CD) ! <r» T “ 

and 

(rc-^ + Cr 

in which Tis a preset threshold, the quantizer output is the erasure symbol 

Otherwise the quantizer output is the letter with largest -likelihood. 

a. Show that the system can be modeled by the discrete channel illustrated in 
Fig. P7.12. 
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Figure P7.12 


b. Show that 

w = (1 - e~ TZ l*)( 1 - P[*! 2 + x 2 > T 2 }), 
q = P[(«! 2 + x? > T 2 ) n (x 2 + x 2 > x 2 + x 2 )\ 

p = 1 - q - w , 

in which x lt x 2 , x 3 , x i are statistically independent unit-variance Gaussian random 
variables and 

x 2 + x 2 = 2 EjJT 0> x 3 = xt = 0 . 

c. Express P^ 2 + x 2 2 > T 2 ] in terms of the Marcum £)-function,f defined as 

Q(m, T) = J I 0 (>nr) dr. 

[/ 0 ( ) is the zero-order modified Bessel function of Eq. 7.49a.] 

d. Similarly, express q as a single integral involving /„( ). 

e. Express R 0 ' for the discrete channel in terms of p, q, and w. 

7.13 Consider a bandpass fading channel and DSB-SC sine-and-cosine de- 
modulation. If a lowpass signal s(t) is transmitted, the two demodulator outputs 
are 

r c (t) = z c s(t) + n c {t), 
r s (t) = z s s(t ) + n s (t), 

in which n 0 (i) and n s (t) are statistically independent zero-mean Gaussian noise 
processes with power spectrum Ay 2 over the frequency band |/| < W occupied 
by s{(). 

Heretofore we have considered only the case in which z c and z s are zero-mean 
Gaussian random variables, each with variance bj 2. If the channel model is 
modified so that 

z B = a cos 0, z s = a sin 9, 

the fading is called “Rician.” Here a and 9 are known constants. 

f J. I. Marcum, “A Statistical Theory of Target Detection by Pulsed Radar,” IRE 
Trans. Inform. Theory, IT-6, 159, April 1960. 
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a. Show that the Rician model corresponds to a received signal that (in the 
absence of noise) is the resultant of two components, one specular and the other 
Rayleigh. Represent the received signal by a phasor diagram. 

b. Derive a block diagram of the optimum receiver when s(r) is one of two 
equally likely orthogonal signals, each of energy E s . 

c. Now consider instead the receiver which would be optimum if a were zero. 
Show that the error probability produced by this receiver is 


P[S] = p exp 


-pa 2 E s \ 

W 0 I ’ 


in which 



7.14 One of two equally likely binary messages is communicated over a random- 
phase, additive white Gaussian noise channel by means of the signals 

Si(t) = q(t) + s 0 (t), 
s 2 (t) » qO) — *«(')• 

The waveforms q(t) and s 0 (t) are orthogonal, with energy E q and E 0 , respectively 
(E 0 E a ). The channel phase is' uniformly distributed and the noise power 
density is WJ2. 

a. Is the receiver optimum which estimates channel phase from q(t) and uses 
this estimate to make a “coherent” decision between ±s 0 {t) r l 

b Specify the error probability of the optimum receiver in terms of the 
Marcum Q-function (see Problem 7.12, part c.) You will also need to use the 
following result: if a*. *s, * 3 , ** are statistically independent unit-variance 
Gaussian random variables, f then 


in which 


P[x x 2 + xf > Of , 2 + Xg 2 ] => |[1 - <2(6, a) + Q(a, 6)], 


„ * # +V. ;,2 gL + fl 

? = — » b ~ 2 


t S Stein, “Unified Analysis of Certain Coherent and Non-Coherent Binary Communi- 
cations Systems,” IEEE Trans. Inform. Theory, IT-10, 43-51, January 1964. 
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Waveform Communication 


In the preceding chapters we have considered discrete communication 
systems, in which the transmitter input is chosen from a finite set of 
possible messages. The communication problem is somewhat different 
when the set of possible input messages is defined on a continuum. For 
example, consider the system shown in Fig. 8.1. A random variable, m, is 



Figure 8.1 Communication of a continuous random variable. 


presented to the transmitter and a waveform s m (t ) — some attribute of 
which depends on m — is transmitted over the noisy channel. After 
observing the received signal, r(t), the receiver must deliver at its output an 
estimate, m , of m. The essential difference between this system and those 
already considered is that m is now assumed to be a continuous rather 
than a discrete random variable. For example, we might assume that m 
can take on any value between —1 and l, whereas in the corresponding 
discrete case m might be restricted to the values 


{±2ijM}, z = 0, 1, ... , Mil. 
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In an operable discrete communication system the probability that the . 
receiver output m will equal the transmitter input m is close to unity, and 
it is meaningful to measure the system performance in terms of the 
probability of error. In the continuum case, however, the probability 
that m — m is, in general, zero; small noise perturbations produce 
changes in the received signal which are indistinguishable from those 
produced by small variations in the input m itself. When m is a continuous 
random variable, it is not meaningful to judge the performance of a 
communication system on the basis of probability of error. Some other 
criterion of goodness is required. 

In engineering practice the main attributes of a desirable criterion of 
goodness are that it should be mathematically tractable, that it should 
point the way to efficient system designs, and that it should accurately 
reflect the degree of user satisfaction with the system. In actual problems 
of continuum communication, however, such as the transmission of 
speech, it is exceedingly difficult to devise a performance criterion that 
simultaneously satisfies all three of the desirable attributes listed. The 
essential difficulty is that entirely different speech waveforms may be 
subjectively equivalent to a listener, but the rules defining the equivalence 
relations are not understood well enough to permit full exploitation in 
achieving a maximally efficient system design. 

As a consequence the historical approach has been to attempt to repro- 
duce at the receiver output a waveform that is a faithful replica of the 
transmitter input. Such a criterion is clearly sufficient, since a high 
fidelity obviously does lead to user satisfaction. On the other hand, a 
system designed on this basis may be inefficient in the sense that more 
transmitter power may be required than if it had been possible to design 
around a less stringent but subjectively equivalent criterion. 

We follow the classical approach here and assume that fidelity of 
waveform reproduction is our communication objective. How to measure 
“fidelity” is then the problem. The requirement of mathematical tracta- 
bility has been of paramount importance historically and for systems 
disturbed by additive white Gaussian noise has led to the acceptance of 
the mean-square error between input and output waveforms as the criterion 
of goodness. We shall see that this criterion also meets the objective of 
leading to useful design procedures. For the single random variable 
communication system of Fig. 8.1 the mean-square error is defined by the 

equation _ 

e 2 = E[(m — mf] = (m — mf. (8-1) 

The expectation is taken over the joint ensemble of all allowable inputs 
and all allowable noise disturbances. 
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• We begin by analyzing certain continuum communication schemes to | 

determine the mean-square error when the channel is perturbed with 
additive white Gaussian noise. The analysis procedure is first to consider 
a single random variable input and then to apply the results to a random 
waveform input. After illustrating how continuum (as well as discrete) 
communication is constrained by channel capacity, we return to discrete 
systems in the consideration of pulse-code modulation, abbreviated PCM. j 


8.1 LINEAR MODULATION 

Various devices are used in practice to -generate transmitted waveforms. 
An important class, called linear modulators, generates waveforms that 
vary linearly with the transmitter input. This class includes double- 
sideband (DSB), double-sideband- suppressed carrier (DSB-SC), and 
single-sideband (SSB) systems. 

Single-Parameter Input 

Consider the communication system illustrated in Fig. 8.2. The trans- 
mitted waveform is given by 

sM) = rnA (j) L (t), (8.2) 


n w (t) 



Figure 8.2 System using a linear modulator to communicate the random variable m. 


in which A is the voltage gain of the transmitter amplifier and y x (t) is 
some waveform with unit energy, 



<Pi 2 (0rf* = L 


(8.3) 


The transmission is disturbed by an additive white Gaussian noise process 
njt) with power density JCJ2. Thus the received signal is 


r{f) = sjt) -1- njt). 



(8.4) 
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Our first task is to determine the structure of the least mean-square error 
receiver and evaluate its performance. 

Mean-square error. In formulating the receiver design problem we 
assume that the received random process r(t) is represented by some vector 
r. In order that the over-all mean-square error (averaged over all possible 
pairs of transmitted and received vectors) may be minimum, it is clearly 
necessary and sufficient that the conditional mean-square error, e 2 (p), 
given each possible value p of the received vector r, should be minimum. 


Pm(a |r= p) 



Figure 8.3 Possible a posteriori density function with conditional mean m( p). 

This follows from the fact that the over-all mean-square error can be 
written 

6 2 = f £ 2 (p)p r (p) dp ( 8 - 5 ) 

J — CO 

with 

?(p) 4 (a — mf p m ( a | r = p) doc 

4 E[(m - mf | r = p]. • ( 8 - 6 ) 

As in Chapter 7, E[ | ] is used to denote a conditional expectation. 

Equations 8.5 and 8.6 give an analytic expression for c 2 in terms of m; 
the remaining problem is tojmd a rule for determining the receiver 
estimate m in such a way that e 2 will be minimized. Figure 8 3 illustrates 
a typical a posteriori density function, p m | r , with conditional mean 

m(p) = E[m | r = p]. _ ( 8 - 7 > 

We now show that the assignment m = m( p) minimizes e 2 (p). The 
argument is identical to one already encountered in Section 7.4, Receiver 
Interpretation. Assume that the receiver assigns any other number to 

m, say 


m — w(p) + 4. 
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Taking the expectation over the ensemble of all possible values of m, we 
have (as in Eq. 7.112) 

e 2 ( p ) = E[(m - m(p) — A) 2 1 r = p] 

= E[(m — m(p)) 2 1 r — p] — 2A E[m — m(p) ] r = p] + A 2 

= E[(m — m(p)) 2 j r = p] 4- A 2 . 

Clearly, e^p) is minimized by choosing A = 0, that is, by assigning 

m = m(p). (8.8a) 

Thus the least mean-square-error estimate of m (given r — p) is the 
conditional mean and the resulting mean-square error is the conditional 
variance 

e 2 (p) = E[(m — m(p)) 2 1 r = p]. (8.8b) 

Since the derivation of Eqs. 8.8a and b involves neither the assumption 
that the modulation is linear nor that the channel is Gaussian, Eqs. 8.8 
have unrestricted validity. 

In the particular case of linear modulation and additive white Gaussian 
noise let (p^t) define one axis of the signal space; the transmitted signal is 
then represented by the vector mAtpp. 

sjt) = mA <px(/)^ s m = mA Vl . (8.9a) 

If we now let n x (t) denote the component of n w (t) along ^(0, then n x {t) is 
also represented by a one-dimensional vector , 

«i(0 = n x <Pi(t)<-> n x = (8.9b) 

where 

= f n w (t) dt. (8.9c) 

J-co 

Finally, if we let n*{t) denote the rest of the noise, 

n*(t) = njt) - ni(t) = K0 - [s m ( 0 + »i(0L 

it follows from Eqs. 4.45b, 4.46 and 4.25b that any vector n* formed from 
n*(t) is statistically independent of both s m and n x and is therefore irrele- 
vant; we have 

Pm[x=Pm\r 1 > ( 8 - 10a ) 

in which 

Too 

^ = r « tp x = dt — mA + n v (8.10b) 
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Hence the vector notation of Eqs. 8.8a and b is superfluous: the least, 
mean-square-error estimate of m may be written as 

/*QO 

m = m{p) = up m (a | r a = p) da (8.11a) 

J —GO 

and the resulting mean-square error as 

e 2 (p) = E Urn — m(p)) 2 1 >\ — p]. (8.11b) 


and the resulting mean-square error as 

e 2 (p) = E[(m — m(p)) 2 1 >\ — p]. (8.11b) 

The conditional density function p m[Ti is given by Bayes rule: for all 
a and p 

pj.« ] r 1 = p) = ^f\p n (p |m = «). (8.12) 

Pn(P> 

But r x = mA + n lt and n x is a zero-mean Gaussian random variable 
(with variance Jfjl) that is statistically independent of m. Hence 

Pn(P | m = a) = p ni (p - aA) 

1 g-ip- xAyi-X'O' 

y/rrN o 

Substituting in Eq. 8.12, we have 

Pm (a | = p) = B 1 (8.13) 

Here B x is a constant that normalizes the integral of p m , Tl to unity. 

Further analysis depends critically on the a priori probability density 
function p m . The simplest situation occurs when m is also a Gaussian 
random variable, say with zero mean and variance <7 2 . 


_ J_ 

2cr 2 


n ( -y') — 

1 

— 1 £ 


\j2i T o 

B x 

r 

V 2-na 

exp — 


After completing the square in the exponent, we obtain 

, „ F lv4V + JYV2/ <r s A 

I n = P) = B, exp i ffWo/2 - p a , A . + Xo/2 


Since the functional dependence on a is of Gaussian form, the a posteriori 
density function p m[n (as well as the a priori density pj is Gaussian. It 
follows that the normalizing constant B z is 


B 9 = -= 


aX/2 


V2tt \A 2 a 2 + JT 0 /2 
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The conditional mean is identified as 


m(p) ~ — , 

A* , JT 0 /2’ 
cr 2 /4 2 

and the mean-square error when m — m(p ) is 

?(p) = - 2 

A 2 o 2 + J\P 0 /2 

, 1 


(8.16a) 


(8.16b) 


We note from Eq. 8.16b that the transmission of s m (t) yields an a 
posteriori variance of m that is reduced from the a priori value, a 2 , by the 
factor (1 + 2EJN 0 ) -1 , where 


4E [f ! 


s m \t)dt = A 2 m 2 = <j 2 A 2 


is the mean signal energy. Perfect communication is obtained only in the 
limit as EJN 0 --*■ Since e 2 (p) is independent of p, it follows from 
Eq. 8.5 that the right-hand side of Eq. 8.16b is also the over-all mean- 
square error. We therefore have 

€ 2 = m 2 ^ . (8.18) 

1 + 2 E m /N 0 

Minimax reception. We recognize from Eq. 8.16a that m(p) is a linear 
function of the relevant received signal component p. Thus the minimum 
mean-square error receiver when p m is Gaussian is a linear receiver. A 
receiver realization is shown in Fig. 8.4. It consists of a filter matched to 
followed by an attenuator with gain 

G = - , (8.19) 

A 1 + 2EJJT 0 K > 


s m (t) = mA ifii(t) 

a / 


jh(t) = V 1 <T-t){ 


at t=T _ 2 g m /j)- 0 

~ ~A 1 + 2 E m /x 0 

Figure 8.4 Minimum mean-square-error receiver when p m is Gaussian. This receiver 
is also minimax. 





588 WAVEFORM COMMUNICATION 


Of course, the minimum mean-square error receiver is not linear when • 
p. m is not Gaussian, even though the transmitted signal sjt) depends 
linearly on m. This follows from the fact that p m[n in Eq. 8.12 involves 
p m , as well as p Ti]m . But it is easy to show that the linear receiver of Fig. 
8.4 is minimax in the sense that no other receiver yields a smaller mean- 
square error when the modulation is linear and p m is most adverse. In 
particular, we have already observed that no other receiver performs as 
well when p m is Gaussian. The minimax claim may therefore be proved 
by showing that the receiver of Fig. 8.4produces the same mean-square 

error for every p m with second moment m 2 . 

Proof follows directly from Eq. 8.10b and Fig. 8.4. We have 

m = ( mA + «i )G 

(m — m) = m( 1 — AG) — n x G. (8.20a) 

Averaging over the joint ensemble of m and n x yields 

p A (m _ wf = ^(1 - AGf + ^G\ (8.20b) 

in which the second equality follows from the statistical independence of 
m and n x . But 



1 2EJX 0 
A 1 + 2 EJX 0 ’ 


and 

1 " A ° = 1 + 2SJX, ■ 

Substituting in Eq. 8.20b and simplifying again yields 


1 + 2 EJX 0 ’ 

which agrees with the Gaussian result of Eq. 8.18 and is independent of the 
functional form of p m . 

A convenient measure of system performance is the signal-to-noise 


power ratio, §jX, defined as 


S a. ra 8 

X ~~ ? ' 


(8.21a) 


From Eq. 8.18, 


S i . 2E n 

X X 0 ' 


(8.21b) 
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Acceptable communication is possible with minimax reception if and only 
if the available mean energy-to-noise ratio can be made relatively large, 
say EJX o > 5. 

Maximum-likelihood reception. A receiver that considers all allowable 
values of m and assigns m from among them in such a way that 

p T ( p | m = m) > p r (p | m = a); all allowable a (8.22) 

is called a maximum-likelihood receiver. For linear modulation we have 
seen that 


p r (p | m = a) ~ p ri (p | m = a.) 



Figure 8.5 Maximum-likelihood receiver when p m ( a) >0 for — go < a < co. 


so that the value of a for which the right-hand side of Eq. 8.23 is maximum 
is p/A. Thus a maximum-likelihood receiver sets 


m = (8.24) 

A 

when r x = p and when the allowable range of m is unrestricted.! The 
resulting receiver structure, shown in Fig. 8.5, is identical to that of 
Fig. 8.4 except that the gain G is now chosen to be 1/A regardless of 
the value of XJ2. 

The mean-square error for the receiver of Fig. 8.5 is readily determined. 
We have 

1 ft 

m — (mA + « 1 )- = m-l — - 
A A 

ivp 

e 2 = ( m _^)2 = lA 0 2 . (8.25a) 


Because A 2 = E m jm 2 , 



X X 0 


(8.25b) 


f A priori density functions p m that restrict the range of m to a finite interval of the 
real line are considered in the next section. 
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These results are similar to those of Eqs. 8.18 and 8.21 for the minimax- 
receiver; both are independent of the form of p m . 

We conclude from Eq. 8.25 that maximum-likelihood reception is 
essentially minimax for large values of mean energy-to-noise ratio 
EJN 0 , a condition that we have already noted is necessary for good 
communication. The conclusion follows from observing that the minimax 
mean-square error of Eq. 8.18 is obtained by multiplying the maximum- 
likelihood mean-square error of Eq. 8.25a by the factor 


1 + JV o/2£„ 


(8.25c) 


which approaches unity as E m IJf 0 becomes large. 

An advantage of maximum-likelihood receptionjs that the attenuator 

gain, G = 1 1 A, does not involve knowledge of ni l and X a j 2. With the 
receiver of Fig. 8.4, the mean-square error may be unnecessarily degraded 
if an inaccurate assumption is made about the value of these parameters. 
For example, assume that G is adjusted to minimize * 2 under the 
assumption 

SJ/) = Y°; all/. (8.26) 

From Eq. 8.20, the receiver output error is then 

e 2 = (m — mf = m 2 (I — AGf + n 2 G 2 

- + ni2GS ' 

If the true noise power density on the channel is zero, so that n x 2 = 0, 
there is a mean-square error 


e - li + 2 ejjtJ 

which is not equal to zero even though the channeUs noiseless. On the 
other hand, with maximum-likelihood reception e 2 approaches zero as 
the channel becomes noiseless even when this occurs unbeknown to the 


Bounded inputs. The receiver of Fig. 8.5 is not strictly maximum- 
likelihood when m is restricted to lie within a finite interval of the real line. 
An example is any a priori density function such that 

f >0; -«<«<« 01.2*0 

\ == 0; elsewhere. 




Without loss of generality, we hereafter normalize the (positive) constant 
a to 1, so that 

. -l<m<l. (8.28b) 

Bounding m constrains the locus of the transmitted signal vector 
s m — m/Upi as shown in Fig. 8.6. As m ranges over the interval [— 1, + 1], 
the tip of the vector s m moves along the g^-axis from —A to +A. 

The dependence of the likelihood function p r ^ m of Eq. 8.23 on the value 
of m is plotted in Fig. 8.7 for several values of p. If \p < A, the value of 
m that maximizes p Ti]m is pi A. On the other hand, pi A is not an allowable 
value of m if |/>| > A. It is apparent from Fig. 8.7 that when \m\ is bounded 
by 1 the maximum-likelihood receiver sets 

+ 1 ; P>A 

T ; -A < p < A (8.29) 

A 

— l; p < —A. 

Such a receiver is diagrammed in Fig. 8.8a; it differs from the unbounded 
receiver of Fig. 8. 5 . only by the inclusion of a saturating transducer. 




Figure 8.7 Dependence of the likelihood function on the value of m, for several 
values of p. 

1 r <P-*An 

^ ( plw = a) = __ exp ^ ; *T-J. 
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( b ) 


Figure 8.S (a) Maximum-likelihood reception when m is bounded. (6) Dependence of 
the likelihood functions on the value of and i\*, for two values of m. 

The presence of the transducer reduces the mean-square error to a value 
somewhat smaller than Jf 0 j2A 2 . Why this is so is clarified in Fig. 8.8h, in 
which we plot the conditional density function p Tl *\ m of the transducer 
output as a function of r x * for specific values of m. From Fig. 8.8a, 
E {(m — mf | m = m 0 \ is 1 jA 2 times the second moment of p ri »( \ m — m 0 ) 
around the point m 0 A. When m 0 is not too close to ±1, this second 
moment is substantially equal to JPJ2, the variance of As m 0 
approaches ±1, however, the second moment of p r ^( | m = m 0 ) 
around m 0 A decreases: when m 0 = +1, only negative values of the 
relevant noise component n x contribute any error at all. The conditional 
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mean-square error is then approximately 

Because the over-all mean-square error 

E [(m — mf] 

is obtained by averaging E[(m - mf | m = m 0 ] with respect to p m , one 
implication is that maximum-likelihood reception always produces 

Ji <-• -To 

€ .. (8.30) 

m which the equality holds if and only if m is unbounded. A second 
implication, however, is that bounding m does not reduce ? materially 
unless p m concentrates nearly unit probability within intervals of length 

(ljA)y/JPj2 or so adjacent to the end points m = ± I. Unless this con- 
dition on p m is met, we have 



(8.31a) 

(8.31b) 


and the effect of the transducer on e 2 is negligible. 

In most cases knowledge that \m\ < 1 may safely be ignored; that is 
the saturating transducer may be omitted from the receiver without 
substantially degrading system performance. On the other hand, a priori 
knowledge that p m approximates the degenerate form 

PmW) - - 1) + <5(a + 1)] 

should not be ignored. In this extreme case, for example, it is clear from 
Fig. 8.9 that the attainable mean-square error is 4Q(A\ / 2IN (t ) which is 
very much less than ^ 0 /2^ 2 when 2 A 2 /aT 0 is large. We do not treat 
egenerate p m henceforth and therefore consider only those receivers that 
do not incorporate a saturating transducer. It is convenient to refer to 
these receivers as “maximum-likelihood” even when p m is bounded. For 
such receivers Eqs. 8.31 are exact rather than approximate. 

Bounded density functions are important in systems having linear 
modulation because they permit a bound to be placed on the peak signal 
energy. Since s m (t) = mA <p x (t), the restriction \m\ < 1 guarantees that 
the actual signal energy E m = m 2 A 2 is less than or equal to A 2 for all m. 





1 I *■ 

(b) 


Figure 8.9 A degenerate case. As shown in ( b ) for the case p - l A, the likelihood 
function is larger for a = +1 than for « = » -1 unless p < 0. The conditional prob- 
ability of this event when m = + 1 is Q(AV 2/JNP 0 ) and the resulting error is (m - m)" = 
[1 - (-l)l 3 = 4. 

We hereafter denote the maximum signal energy by E s . For linear modu- 
lation with \m\ < 1 and maximum-likelihood (but transducerless) receivers, 
we have 

E s = A 2 

and 

^ = (8.32) 

2 A 2 2E S 

If p m is uniform over [—1, 1], then m 2 = g and 



The average signal energy then differs from the peak signal energy by 
approximately 4.8 db. 

Sequences of Input Parameters 

The single-input parameter analysis and the mean-square-error result of 
Eq. 8.32 both extend trivially to the communication of a sequence (vector) 
of continuous random parameters, say 

m = (m 1} w 2 , • ■ • , %). 


(8.34) 
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provided that successive components of m modulate orthogonal wave- 
forms. Consider, for example, the communication system illustrated in 
Fig. 8.10«: the transmitted signal is given by 

s m (0 = A 2 rn h (p k (t), (8.35) 

fc=i 

in which the {%.(?)} are assumed to be orthonormal. The communication 
objective is to produce at the receiver output an appropriate estimate, m. 



Sample 
at t = T 


Figure 8.10 Communication of a random vector by means of linear modulation and 
maximum-likelihood reception. 

of the transmitter input vector, in which 

ni = (wj., m 2 , . . . , m K y. (8.36) 

As usual, the relevant part of the received signal r(t) = sjt) + njt) 
is incorporated in the vector 

r = (>i> r x ), 


(8.37a) 
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with components 

r k = f r(0 %(0 dt 

J — CO 

= m*4 + » fc ; k~l,2,...,K. (8.37b) 

Since n w (/) is white Gaussian noise, the likelihood function is 


( j \ 1 — |p— aAf/wN 0 

p.(p I ">=“)- 


a: 1 

= n-7=« 


—(pk—aicA) 2 / A”, o 


It follows from the factorization of Eq. 8.38 that the vector a that maxi- 
mizes p r (p I m = a) has components (a fc = pJA}, k = 1, 2, . . . , K. A 
maximum-likelihood vector receiver therefore estimates each of the 


parameters m k separately. When r — p, it sets 


= k — 1,2, . . . , K. (8.39) 

A 

Such a receiver is diagrammed in Fig. 8.106. The maximum-likelihood 
sequence-of-parameters communication problem is just a sequence of 

independent one-parameter, problems. 

An appropriate performance measure in communicating a vector of 
random parameters is the mean-square error per component, which we 

again denote e 2 : 

? A J_ E[|m - m[ z ] 

K 

- ( 8 ‘ 40a > 

K fc = i 

But Eq. 8.32 applies to each component individually, so that the receiver 
of Fig. 8.106 again produces 

3 = Hjl (8.40b) 

2 A 2 ' 

As before, the noise performance is essentially minimax when the energy- 
to-noise ratio is high. 

It is apparent from Eq. 8.39 that the maximum-likelihood receiver 
always acts as if the {m k } were statistically independent, since then only r k 
is relevant to the estimation of m k . On the other hand, significant y 
improved noise performance can be obtained if it is known in advance that 
the {m k } are tightly dependent. For example, if it is known that. m k - m l5 
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k = 2, 3, . . . , K, a nonmaximum-likelihood receiver can be designed to 
exploit this knowledge. 

Sequence-of-parameter communication systems, such as the pulse 
amplitude modulation (abbreviated PAM) system of Fig. 8.10, are 
frequently encountered in practice. When the {<p fc (0} are given by 

^ = \J\ C0 * 2jT (f» + ~)*; 0 < t < r, / 0 T= an integer 

1,0; elsewhere 

and the (mj are provided by different input sources, the system is called 
frequency-multiplexed. If the inputs (w fc ) are chosen by sequencing 




through the different input sources, as shown in Fig. 8.11, and the orthog- 
onal waveforms {<p k (t)} are translated pulses of duration T/K, 

%(o = y* - 1)> 

the system is called time-multiplexed. The receivers used in conjunction 
with these systems are usually close approximations to the maximum- 
likelihood receiver. 
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Waveform Inputs 

The problem of communicating a random waveform is intimately 
related to the sequence-of-parameters problem. That such a relation 
should exist is immediately apparent whenever the transmitter input 
process, say m(t), can be adequately described by an equation of the form 

m(t) = Vfc(0 5 (8.41a) 

fc= 1 

in which {%(0} * s an appropriate set of orthonormal functions.! Then 
m(t) is specified by the random vector 

m = ( 'm lt m 2, ■ ■ ■ > (8.41b) 

Every input signal encountered in practice is constrained in bandwidth 
in one way or another. A convenient idealization of this fact is the 
assumption that the transmitter input process has been passed through an 
ideal lowpass filter with transfer function 


w m (f) = 



-w m <f< w m 

elsewhere. 


(8.42) 


When this assumption is made, the choice of the {f k (t)j is particularly 
simple. 

Sampling. We now show that an appropriate set of orthonormal 
functions to represent any ideally bandlimited process m{t) is defined by 
the equations 

%(<) = v(< - - ) ; k an inte s er ’ 18 - 43a) 


w(t) A^2wjf (8 ' 43b) 

Proof of the foregoing statement involves four steps. In the first we 
observe that 

( 8 - 44a) 

sj2W m 

f It can be shown 121 ' by methods beyond the scope of this text that any random process 
can be satisfactorily represented over a finite time interval [0, T] by means of Eq. 8.41a 
with { m k } that are uncorrelated. We need only take KIT to be large enough and choose 
an appropriate set {y> k (t)}. The resulting representation is referred to as a Karhunen- 
Loeve expansion. 
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where wjf) is the impulse response of W m (J): 

w m (jy‘- n df 

J — CO 

= r~ cos 2-jrft df=2W m ~~ ■ (8.44b) 

J-Wm 2nW m t 

Thus y>(t), hence each of the {y k (t)}, is an ideally bandlimited waveform. 
Several of the {%..(?)} are sketched in Fig. 8.12. 

In the second step we observe that the {%.(/)} are in fact orthonormal: 
the Fourier transform of y> k (t ) is 

V 2rY m 

from which it follows by Parseval’s theorem that 


" Vi(0 %(0 dt = P %(f) ¥,*</) df 

—CO J — CO 


= r~ df = a K . (8.45) 

In the third step, we invoke the sampling theorem, which is proved in 
Appendix 8 A. The theorem may be stated : 

Theorem. If z(t ) is any finite energy waveform whose Fourier trans- 
form is identically zero for |/| > W m , then 


in which, of course, 


*(0 = 1 z k %( 0 . 

fe=— CO 


**= 2(0 V>kd) dt; for all k. 


(8.46a) 


(8.46b) 


A remarkable property of the {%.(/)}' is that each z k may also be evaluated 
simply by observing z(t) at the instant t = kj2W m : letting Z(f) denote the 
Fourier transform of z(t), we have 

f 00 f® r i q* 

*(0 nit - t) dt - z (/) W m (f)e~ iWT df 

♦'-co J-x> Lf2W m J 

i rw m t 

= - 7 = Z(f)e +l2,rfT df = —== z(f), 


so that setting t = kl2W m yields 


* f2W m \2W m r 


for all k. 


w m (f) 



Figure 8.12 Impulse response of an ideal band-limited filter. 
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Because of this unique characteristic, the {%(*)} of Eq. 8.43 are called 
sampling functions. Equations 8.44 and 8.47 imply that the sampling 
theorem may also be written in the form 


z (a = y k \ %\n2irW m {t - kj2W m ) 
^ \0W / In rW (t — kl?w i 


(A check on Eq. 8.48 may be obtained by setting t = if2W m .) 

The fourth step in verifying the appropriateness of adopting the sampling 
functions for the {%.(0} involves extending the sampling theorem to 
ideally bandlimited random processes. Since the sample functions of such 




Figure 8.13 Implementation of the sampling theorem. From Eq. 8.44, the impulse 
response of the filter H(f) is h(t) — y>(t). 


a process do not in general contain finite energy, the sampling theorem as 
originally stated does not apply to them. 

The key to the extension is to note that the theorem does apply to the 
impulse response of the first filter in Fig. 8.13a. But the following cascade 
of sampler, impulse-train modulator, and second filter is simply a real- 
ization of the mathematical construct of Eq. 8.48. Thus the over-all 
circuit of Fig. 8.13a is equivalent to just the first filter WJJ) alone. 

It is clear that passing a random process m(t ) through a second filter 
W m (f) does not affect the sample functions when m(t) has already been 
passed once through such a filter. Thus Eq. 8.46a applies also to any 
random process m(t) that is already bandlimited to [— W m , W m ]. From 
Fig. 8.136, it is evident that 

CO 

m(t) = 2 ™ k yj k (t), 
k =— oo 


(8.49a) 
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in which each random variable m k is given by 




(8.49 b) 


For the special case in which m(0 is filtered white Gaussian no.se with 
power density Jb 0 /2, the {m k } are statistically independent zero-mean 
Gaussian random variables with variance X 0 /2. Conversely, a stationary 


Gaussian process, say n(t), with 


§„(/) = 



l/l < 

elsewhere, 


(8.50a) 


results when an infinite set of independent Gaussian variables {>>„}, each 
having n t = 0 and n f = XJ2, is used to construct 


^ 50b) 

fc«-eO 


It is not, of course, necessary to presume that the process «(/) m 
Eq. 8.49 is stationary. For example, if we know a prion that only those m fc 
whh index k = 1, 2, . . . , K can be nonzero or if we are * 

communicating only the portion of a process that is attributable to 
these {m,.}, we may use the finite summation 


m(0 - 2>* Vfc(0 (8,51a) 

fc=i 

to represent the transmitter input. Alternatively, with a shift of time 
origin, we would have 


m(t) = u 2 >/2 w(0; K “ ° dd inte 8 er - ( 8 - 51b ) 

fc=-<7C-X>/2 

Such nonstationary processes are completely defined once the joint 
probability density function of the coefficients, p , is specified. It is 
convenient to view the infinite summation of Eq. 8.49a as the limit of 

Eq. 8.51b as K becomes infinite. _ 

The assumption that a process m(t) is ideally bandlimited is not com- 
pletely realistic. We have already noted m Chapter 7 that the filter WJJ) 
is physically unrealizable. On the other hand, as discussed in Appendix 
8A the approximation entailed in such an assumption is a good one in 
most cases of engineering interest and, of course for any given process 
becomes increasingly accurate as W m is increased. We therefore assume 
that m(t) can be adequately represented by means of Eq. 8.51 and the 
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sampling functions of Eq. 8.43. The issues of maximum-likelihood 
reception remain unchanged, however, if the orthonormal set {?/>,.(?)} is 
defined in some other suitable way. 

Performance measure. A convenient performance measure to use 
in the communication of nonstationary random processes having the 
form of Eq. 8.51a is the mean-integral-square error between m(t) and the 
receiver output process, say m(t), normalized by the number of samples, K. 
We define 

? = - E f f 30 [m(0 - m(t)f dt\ . (8.52) 


€ ~ ^ E L J - dt \ ■ ! ’ 8 ’ 52) 

We now show that the adoption of this performance criterion implies 
equivalence between the problems of random waveform and random 
vector communication. 

When the receiver knows a priori that m{t) is bandlimited to 
[— 1 V m , W m ], it is clear that m(t) should also be bandlimited. Indeed, if 

m 0 (t) = 2 m 0k y> k (t) denotes the particular sample function of m(t) that 

7c=l 

is actually transmitted and mft) denotes the resulting receiver estimate 
when any particular noise disturbance occurs, the integrated square 


q, 2 = I ° [#) - dt 

- P 1*0/) ~ Mo(f)\ 2 df. 

J — CO 


Here M 0 (f) and M 0 (/) are the Fourier transforms of mft) and 
respectively. Evidently, e 0 2 must be increased if M 0 (f) is nonzero outside 
[— W„ t , W m ], so that no loss in performance is entailed if the receiver 
estimate w 0 (0 is also represented by the sampling theorem : 


w 0 (t) = 2 ™ok Vfc(0» 


(8.54a) 


Wofc yj2W m mo \2wJ' (8 ' 54b) 

Next, using Eq. 4.58b to write 

co 

e Q 2 “ 2 ( W 0 k ™0fc) 5 

k=— oo 

we observe that e 0 2 is further reduced by setting to zero all m 0fc outside the 
range 1 < k < K spanned by m Q (t). The result is 

K 

e Q 2 = 2 ( m 0 k ” W()k) ■ 


(8.55) 
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Since Eq. 8.55 is valid for any m 0 (t) and any particular noise disturbance, 
the receiver output process can always be written 

m(t) = '2,m k ip k (t). ( 8 - 56 ) 

7c— 1 

Averaging Eq. 8.55 over the message and noise processes yields 

e 2 = J,( m k ~ ™k) 2 - ( 8 - 57 ) 

K i 

Equation 8.57 implies that the problem of communicating m(t) is indeed 
equivalent to the problem of communicating m = (m l5 m 2 , . . . , m x ). 
We may use any convenient set of orthonormal functions {%(?)} and 
transmit 

s m (0=i^%(0- (8.58) 

fc-i 

The receiver then estimates m by a vector m = (m l5 m 2 , . . . , %), and 
constructs m{t) in accordance with Eq. 8.56. 

Receiver implementation. The structure of the maximum-likelihood 
receiver follows directly from Eq. 8.56 and Fig. 8.10. An over-all system 
is illustrated in Fig. 8.14. From Eqs. 8.40 and 8.57 the mean-square error 
per component at the receiver output is 


2 1 y _ •N’ o 

£ ~ Ki£i 2 A 2 2 A 2 


( 8.59 ) 



Sample 
ati = r 


Figure 8.14 Maximum-likelihood receiver for linearly modulated waveform communi- 
cation ( K finite). The capacitors “hold” the output m k from each matched filter, 
h k (t) = <p k (T — t), until time t k = T + kj2W m , k = 1,2 , ,K. 
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In accordance with Eq. 8.58, the average energy transmitted per sample 


Whenever 


we have 


4 = [sJt)?dt 


= m*. 

K k—1 


m k 2 = m 2 ; all k. 


Em = m 2 A z . 


(8.60a) 


(8.60b) 


The signal-to-noise ratio per component then also agrees with the single- 
input-variable result of Eq. 8.25b, 


_S 

N Jf Q 


(8.60c) 


Similarly, if |/« fc | 1, all k, then A 2 = E s , the peak transmitted energy 

per sample, and Eq. 8.59 becomes 

The special case in which the {^(r)} of Eq. 8.58 are chosen to be just the 
sampling functions {%(/)} themselves is shown in Fig. 8.15a. If . we 
assume that 


£72 

m(t)= 2 m k y k (t}, 

k~-K/2 

s m (t) = A m(t). 


(8.61a) 


' (8.61b) 


In accordance with Fig. 8.13, in the limit K-+ co the maximum-likelihood 
receiver becomes just an attenuator followed by an ideal filter. The result- 
ing system is illustrated in Fig. 8.156; it is apparent that the receiver- 
output noise is stationary, with 


n 2 (0 = [m(t) - m{t)f = 


r 2 


(8.61c) 


Note that in the stationary case the mean-square error per sample and 
n\t) are related by 

2 W' 


(8.61d) 
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*=-«. 


(b) 

Figure 8.15 Maximum likelihood receiver when the transmitter is an amplifier: 
(a) K finite; (6) K infinite. 


The minimum mean-square error receiver when m(t) is a stationary 
Gaussian process with power density function 


»«(/) = 2 ; * (»■«“) 

VO; elsewhere 

is very similar to the maximum-likelihood receiver. Since 

a„(T)=x„i ( 862b > 

each of the normalized samples {(2W m )~ >A m(kl2W m )}, k an integer, has 
variance a 2 = X 0 /2, and all such samples are statistically independent. 
The receiver may therefore estimate each sample independently and re- 
combine the sample estimates in accordance with Eq. 8.56. It follows 
from Eq. 8.19 that the minimum mean-square error receiver consists of an 
ideal filter WJJ) and an attenuator with gain 


l + Jf 0 /2 E m ' 


in which 


A" If A 2 

E m = m (0 “ — - A 

2W m 2 


(8.63a) 


(8.63b) 
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is the mean transmitted energy per sample. Such a receiver is shown in 
Fig. 8.16. It may be verified directly that the resulting mean-square 
error is 


MO - m{t)f = 


1 + JCJ2E, 


(8.63c) 


which differs from Eq. 8.61c only by the factor (1 + XJ2EJ- 1 . In 
Eq. 8.63c we have chosen not to denote [m(t) — m(t )] by n(t) because the 
error is not a message-independent additive term; as observed in con- 
nection with Eq. 8.27, the fact that the gain, G, in Eq. 8.63a is not IjA 
implies an error that depends on m(t) as well as on the actual receiver 
input noise. 



Figure 8.16 Minimum mean-square-error receiver when m(t) is a stationary band- 
limited Gaussian process. 


The receiver of Fig. 8.16 is minimax and yields the same mean-square 
error for any m(t) having the power density function of Eq. 8.62a, regard- 
less of whether or not m{t) is Gaussian. An alternate derivation of the 
structure of this receiver, using minimum mean-square-error linear filter 
theory, is provided in Appendix 8B. 

Although the instrumentation of the transmitter and receiver is simpli- 
fied for K -► co by using the sampling functions for the {MO} in com- 
municating m(t), there is often good reason not to do so. For example, a 
PAM time-multiplexed voice communication system is built by choosing 
the {MO} to be evenly spaced pulses whose duration is much less than the 
sampling interval \j2W m . Several voice channels can then be interleaved 
in time onto a single transmission facility. A PAM system may also be 
frequency-multiplexed. 

Frequency translation. Another reason for using a different set of 
orthonormal functions than the {MO} for transmission purposes con- 
cerns the propagation of electromagnetic energy: an audio-frequency 
process m(t) must be converted into a radio-frequency (RF) process if 
radio transmission is to be used. Functions {MO} can be chosen whose 
spectra lie in a convenient frequency range. 
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Just as in the discrete communication discussion in Section 7.2, a more 
usual way to achieve this same objective is to multiply m(t) against a sine- 
wave carrier, say V 2 cos 2i rf 0 t, in which f 0 > W m is the desired radio 
frequency. The signal is then heterodyned back down to baseband at the 
receiver. As in Chapter 7, either DSB-SC or SSB transmission may be 
used. The corresponding maximum likelihood receivers are illustrated in 
Figs. 8.17<z and b for the limiting case K—>co. Both receivers produce 
stationary output noise with variance 

At) = . 

A 

With waveform-input communication, a third common type of linear 
modulation is standard double sideband (DSB), in which the transmitted 
signal process is 

s m {t) = A [ I + 2 cos co 0 t. (8.64a) 

Thus an additive carrier term Ay/ 2 cos co 0 t is transmitted in addition to the 
input-signal-dependent term. A typical sample function of s m (t) is shown 
in Fig. 8.18a. We require that the parameter a, called the “modulation 
index,” be chosen so that 

1 + « m{t ) >0; for all t. (8.64b) 

Thus the envelope of s m (t) is a replica of m{t), which is the reason for using 
DSB. 

As shown in Fig. 8.186 the maximum-likelihood receiver in this case 
again multiplies the received signal by V2 cos co 0 t and passes the product 
through a lowpass filter W m (f). The output is 

At) — A + aA m(t) + n^t), (8.64c) 

in which n x (t) is lowpass Gaussian with power density J\P 0 /2 over the band 
[— W m , fV„J. The dc component of At), caused by the carrier, is removed 
by a blocking capacitor. After scaling, the output waveform is 

n t (t) 

( O e.AA\ 


m(t ) + 


(8.64d) 


in which equality is not exact because the blocking capacitor also effects 
AO an d (possibly) m(t)- The output noise, n{t) m n^O/aA, is stationary, 
but now has variance 

Ai) >» ^ • (8-64e) 

a 2 A- 

From the point of view of the energy transmitted in the modulation 
sidebands, DSB, SSB, and DSB-SC all yield equivalent noise performance; 
but the carrier y/l A cos co 0 t in DSB consumes transmitted energy which 
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does not contribute to improving the estimate m(t). It follows that DSB 
suffers at least a 3-dbf, and usually a 6-db or greater, disadvantage over 
DSB-SC and SSB, the value depending on the modulation index and the 
waveshape of the modulating signal. 

In spite of this disadvantage, DSB is in common use. The primary 
reason is that with DSB an inexpensive receiver which utilizes an envelope 



(a) 



(b) 

Figure 8.18 DSB modulation and maximum-likelihood reception (K-> co); j m (/) = 
A[ 1 + am(j)]V 2 cos c o 0 t. 


detector can be used for demodulation. Such an incoherent receiver is 
nonideal when the phase is known, but, as in the discrete case, the per- 
formance loss is small when ,jV 0 W m is much less than A 2 . Also, the DSB 
signal is the easiest to generate at high power levels. 

As discussed in Chapter 7, SSB has the advantage over DSB-SC or DSB 
of reduced bandwidth. Moreover, if the phase relations between the two 
sidebands in DSB-SC or DSB are disturbed during propagation, inter- 
ference effects result in the demodulator output. Finally, the apparent 
requirement for phase-lock between the SSB modulator and demodulator 
oscillators is not essential in voice communication systems: the human 
ear is relatively insensitive to received phase and even a few cycles of 

f We assume, as usual, that m(t) itself contains no dc component. 
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slowly drifting frequency offset is tolerable. As a result of these con- 
siderations, SSB finds frequent application in systems for operation over 
long-range, high-frequency channels, whereas DSB is used extensively in 
medium-frequency broadcast transmission. DSB-SC has found little 
application in audio communication. 

8.2 TWISTED MODULATION 

Linear modulation and maximum-likelihood reception produce a 
mean-square error that can be reduced in general only by increasing the 
transmitted energy. For white Gaussian noise with power density J'f J 2 
we have seen that 

_ £, __ *n 8 

2 A 2 2 EJX 0 ' 

If we no longer require that the modulation be linear, it is sometimes 
possible to decrease e 2 for a given m 2 without increasing the transmitted 
energy. In particular, various twisted modulation schemes such as pulse- 
position modulation (PPM) and frequency modulation (FM) can yield an 
e 2 smaller than that afforded by linear modulation. 

Geometrical Considerations 

Insight into the advantages and limitations of twisted modulation can be 
gained by investigating the geometrical relations between the channel 
noise and e 2 . Let us begin by reconsidering the locus of the transmitted 
vector, 

s OT = (8.65) 

when a single bounded random variable is communicated by means of 
linear modulation. As illustrated in Fig. 8.19, we may think of the trans- 
mitter amplifier as “stretching” the interval [—1, +1] over which p m is 
nonzero onto the larger interval [—A, +A] in the signal space. The 
stretching is uniform in the sense that 

^2 = A; for all m. (8.66) 

I dm 

The effect of maximum-likelihood reception is to undo the stretching 
performed by the transmitter. The receiver’s attenuator in Fig. 8.19 com- 
presses the message component Am from the interval [—A, +A] back 
onto the interval [ — 1, +1]. If m = m 0 and r x = p, the relevant error 
in transmission is 


»i = P ~ ™ 0 A. 


(8.67a) 





Figure 8.19 Geometrical relations with linear modulation and a single random variable 
input. The maximum-likelihood receiver chooses m to minimize |r — s ffl |. 

With no saturating transducer in the receiver, undoing the stretch com- 
presses the transmission error by the inverse factor 1 fA and again yields 


2 _ | r ji \ _ jCa 


^ = ln 


A 


2A 2, 


(8.67b) 


The dependence of e 2 on the amount of stretching at the transmitter can 
be made explicit by defining 


s4 


dS r 


dm 


( 8 . 68 ) 


where S is independent of m by virtue of Eq. 8,66. We call S the stretch 
factor . In terms of S, Eq. 8.67b is written 


^ _ ^o/2 


(8.69) 
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Twisted loci. The stretch factor plays a fundamental role in deter- 
mining the mean-square error for twisted as well as for linear modulation 
schemes. Before considering specific systems, it is convenient to illustrate 
the basic concept of twisted modulation in a simple situation. Consider 
again the single-input-parameter communication system of Fig. 8.1 and 
for simplicity assume that the transmitter signal space is a two-dimensional 
plane with Euclidean axes <p x and <p 2 . This implies that the transmitted 
signal vector, as a function of the input parameter m, has the form 

s m = 4- tf 2 (m)cp 2 , (8.70a) 


where the (<pj are orthonormal vectors. The corresponding signal wave- 
form is 

s m (t) = afm) cpft) + a 2 (m) cpft), (8.70b) 

with 


<Pi ( 0 <Pi ( 0 dt = d i} . 


(8.70c) 


If the modulation system is to be linear, the coefficients {a^m)} must 
be linear functions of m. But let us now broaden the class of systems 
under consideration and allow the (aj( w )} to arbitrary differentiable 
functions. In general, these functions may be complicated, as in the case 
of the example shown in Fig. 8.20: the curved (twisted) line represents the 
locus — described parametrically by the {tf*(m)} — of the tip of the vector 
s,„ as a function of m. We again assume that m is constrained to the 



Figure 8.20 A twisted signal locus. 
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interval [—1, +1]. If we impose a constraint on the maximum trans- 
mitted energy such that |s m | 2 < E s , the locus_(called the “signal curve”) 
must be contained within a circle of radius V E s . 

The signal curve by itself is not a complete description of the mapping 
m _> S}n ; we must also specify how the tip of s m moves along the locus as 
a function of m. It is convenient to assume that equal increments in m 
correspond to equal increments in distance measured along the signal 
curve; that is, 

for all m, (8.71a) 

where L is the total length of the locus traversed by s m as m increases from 
- 1 to + 1 . Then we can again define a stretch factor S that is independent 
of m: 

Si ^ =-. (8.71b) 

dm 2 

Jt is apparent that this definition is consistent with that given previously 
in the case of linear modulation. We shall see later that the uniform 
stretch assumption of Eq. 8.71a is justified by minimax considerations, as 
well as by considerations of mathematical simplicity. 

Weak noise suppression . We now show that the mean-square error 
for a maximum-likelihood receiver, given the signaling scheme of Fig. 8.20 
and an additive white Gaussian noise disturbance whose power density is 
sufficiently weak, is approximated by 


_ L , 
dm 2 



(8.72) 


Why this is so is clarified in Fig. 8.21, which represents an enlargement of 
a small section of Fig. 8.20 around some point s 0 . We assume that the 
input parameter m has the value m 0 corresponding to s OT = s 0 and that 
the noise density is so small that with high probability the received point 
r will lie close to s 0 . By this we mean that within a circle of radius equal 
to several standard deviations of the noise, centered on r, the signal curve 
may be accurately approximated by a single straight line tangent to 
s ?n at s 0 : 

s m » s 0 + (m - m 0 ) s 0 (8.73a) 

with 



m=m o 


(8.73 b) 
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Under these conditions the “local” reception problem is geometrically 
the same as the problem with linear modulation. With white Gaussian 
noise the maximum likelihood receiver chooses m as that value of m for 
which |r — s m | is minimum. Given sufficiently weak noise, we may 
neglect the probability that r will lie closer to some other fold of the 
signal curve than to the section approximated by Eq. 8.73. In the vicinity 



Figure 8.21 Reception in weak white Gaussian noise. Since 
pr(? | m = a) ~ exp [- | p - 

when r = p a maximum-likelihood receiver considers only the fold of s ra nearest to the 
point p. The probability that this fold contains the transmitted point s 0 approaches 
unity as X 0 /2 0. 

of r the locus of possible transmitter points then looks like a straight line 
with a local stretch factor |s 0 | . Hence Eq. 8.69 applies locally, and the 
conditional mean-square error is 

. l\P I? 

E[(m — mf \ m = m 0 ] . (8.74) 

|s 0 | 2 

Invoking the condition 

|s 0 | = S; for all m 0 (8.75) 

and averaging with respect to p m yields Eq. 8.72. (We again neglect end 
truncation effects that reduce the error in the vicinity of m = ±1.) 

Next, we show that the validity of Eq. 8.72 for weak noise is not re- 
stricted to transmitted waveforms s m (t) defined (as in Eq. 8.70) on only 
two { 92 * 00 }; s m(0 may involve any number of orthonormal functions. 
For noise that is weak enough the conditional mean-square error when 
m = m 0 depends only on the behavior of s m (t) in the neighborhood of m Q . 
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In this vicinity it is true for any differentiable s m (f) that 

sjt) « s 0 (0 + (m - mMtY, l m “ « ls (8 ' 76a) 


S 0 (t) — ^Jt) | m=J: 
A °(0 = ~ 5 m (0 


(8.76b) 

(8.76c) 


By a Gram-Schmidt argument the two waveforms J o (0 and i o(0. re q uire at 
most two dimensions for their representation. Hence, for any given m 0 

we can write 

s w ^s 0 + (m-m 0 )s 0 ; |m-m 0 |«l, • (8-77) 

in which (as in Eq. 8.73) the vectors s 0 and s 0 are two-dimensional and 
represent sft) and s 0 (t), respectively. The argument leading to Eq. 8. 

‘"it — nt to express the streteh factor S directly in 

terms of sjt). Recalling that the magnitude squared of a vector is equal 
to the energy in the corresponding time function, we have 

^praOOl 2 dL (8.78) 

J— oo L din J 

Thus, whenever the right-hand side below is independent of m, the stretch 
factor is given by 

s * = r[i-aol* ^ 

J— co L.d}7l 

and, for noise weak enough that the approximation involved in Eq. 8.77 
may be neglected, 

“5 _ £o/2 (8.79b) 

S 2 


Threshold. For any given differentiable signal curve such as that of 
Fi<r. 8.20 it is always possible to take the noise density NJ2 small enough 
that the linearized analysis leading to Eq. 8.79 will be valid. For a gwen 
signal space with XJ2 and E s fixed, however, it is definitely not true that 
the mean-square error can be made as small as we like by ma mg le 
length of the signal curve, hence the stretch factor S, larger and arger 
The nature of the fallacy is clarified in Fig. 8.22. If we confine the 
signal curve to a sphere of fixed dimensionality and of radius sJE the 
length of the curve cannot be increased indefinitely without folds of 
the curve necessarily lying closer and closer together. On the other hand, 
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as indicated by the dotted circles, the conditional probability density 
function of the received vector r, given s = s 0 , is spherically sym- 
metric about the transmitted point s 0 , with contours of equal proba- 
bility density that depend only on the noise density J\ p 0 /2. It follows that 
when L is increased indefinitely while E s and JV° 0 /2 are held con- 
stant a situation must ultimately be reached in which several different 
folds of the signal curve will pass through regions in which p l{s is signifi- 
cantly large. When this happens, the signal curve in the vicinity of the 


<P2 



Figure 8.22 A signal locus for which the linearized analysis is invalid. The radii of 
the dotted circles are V NJ2, 2V,JV > 0 /2, 3'V / j\r„/2. Because 

pr(P | m - m 0 ) = p a ( p — s 0 ) ~exp [- |p - Soi 2 M 0 0 ], 
the probability is high that r will lie closer to some other fold of s m than to s 0 . 

set of vectors that is likely to be received when s 0 is transmitted can no 
longer be well approximated by the single straight line of Eq. 8.77 , and 
the linearized analysis leading to Eq. 8.79 becomes invalid. 

Actually, we can see that not only the linearized analysis but also the 
entire communication system will break down under these conditions. 
For example, the output m from a maximum-likelihood receiver jumps 
discontinuously as the received vector r moves continuously across the 
boundary separating the points p x and p 2 in Fig. 8.23. Crossing such a 
boundary essentially “disconnects” m from the actual transmitter input, 
m = m 0 . 

Furthermore, the breakdown of the communication system under these 
conditions is fundamentally unavoidable and cannot be ascribed merely to 
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l 

\ 


Figure 8.23 When r = p„ the point of s m closest to r is close to s 0 , hence m is close to 
m 0 . This is not true when r = p 2 . 

some deficiency in maximum-likelihood reception. Consider the situation 
shown in Fig. 8.24a in which the received vector p lies close to two folds 
of the signal curve. From Bayes rule the a posteriori density function of 

p m ( « | r = p) = £ y~ P,(P | ™ « “)• ( 8 - 80 > 

PrlP/ 

As long as p m { a) > 0 over the intervals of a that map onto these folds of s ?18 , 


/ 

' 

m — —04/ 

P 

fm = 0.5 


^ ... — n 

\ \ 

o 

*.ni — u 

| Signal locus, s m 


(a) 



Pm(0i | 
1 

r = p) 

\ 

-1 

-0,4 

0.5 +1 


(b) 


Figure 8.24 A received vector leading to a multimodal a posteriori density function. 
Setting m equal to either -0.4 or + 0.5 is likely to incur a large error. Moreover, 
the conditional mean in ( b ) lies in a region of low probability density, so that the 
minimum mean-square estimate of m is even less credible. 
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some region of p must exist between the folds for which p m { a | r = p) is 
violently multimodal— that is, p m { cc | r = p) contains two or more dis- 
connected local maxima of substantially the same magnitude, as shown in 
Fig. 8.24b. 

When p m | r is violently multimodal (or, even worse, unimodal around 
some value of m disconnected from the transmitted value), we say that the 
received signal is anomalous. In an anomalous situation it is not possible 
for any receiver to make a meaningful estimate of m; this follows from 
the fact that the a posteriori density function p m]r , which contains all data 
relevant to any estimate of m, is either fundamentally ambiguous or is 
misleading. The objective of minimizing the mean-square error, for 
example, is clearly inappropriate. 

With maximum-likelihood reception, the received signal is anomalous 
whenever r lies closer to a different fold of the signal curve than it does to 
the one containing the transmitted point s 0 . It is evident geometrically 
that for a signal space of fixed dimensionality anomalous receptions are 
bound to occur with increasing probability in either of two circumstances: 

1. When JT 0 /2 and E s are fixed and the stretch factor S is increased. 

2. When the twisted signal curve (hence E s and S) is fixed and Jf 0 /2 
is increased. 

When the conditions are such that the probability of an anomaly 
exceeds some tolerable level, say 10“ 4 or less (determined by the appli- 
cation), we say that threshold is exceeded and do not expect communication 
to be acceptable. Unfortunately, although the mean-square error pro- 
vided by a twisted and uniformly stretched modulation system is reduced 
by the factor l/S 2 when the noise density JVV2 is small enough, the noise 
density at which threshold is exceeded (for fixed E s and fixed signal-space 
dimensionality) is in general small when S is large. In practice, it is 
necessary to design a twisted-modulation system to achieve a satisfactory 
compromise between these two effects. 

The arguments and concepts we have discussed have been illustrated for 
convenience in a two-dimensional signal space. The same considerations 
and conclusions apply to signal curves defined on any finite dimensional 
space. In the sequel, we also apply these ideas to signal loci that require an 
infinite number of dimensions for a complete description. (The obser- 
vation in Chapter 4 that a finite number of dimensions is always sufficient 
for describing a signal set applies only to discrete systems with a finite 
number of messages.) It is shown in Section 8.4 that the probability of 
anomaly must increase as the length L of the signal locus increases when- 
ever EJN o is held fixed, even though the dimensionality of the locus is 
increased simultaneously 
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Minimax considerations. We now show that -the /niform-stretch 
assumption of Eq. 8.71b leads to a maximum-likelihood receiver that 
“weakxnoise min Lx.” By this we mean that uniform stretch mmtmizes 
the mean-square error produced by maximum-like hhood reception when 
p is chosen most adversely and JV* 0 /2 is very small. 

Let us assume that a particular twisted signal locus of length L (such as 
that shown in Fig. 8.20) is given but that we are free to specify t 
“velocity” with which s„ moves along this locus as a function of m , that 

is to say, assume that we can vary S(m) = \dsjdm\ subject to the con- 
straint that s„ moves from one end of the locus to the other as m ranges 

over [— 1, +1]: 

J S(m) dm = L. (8.81) 

Our problem is to assign S(m), as a function of m, in such a way that is 

m x"n n Which" m for all |m| < 1 has already been in- 
vestigated. From Eq. 8.79 we have 

”5 _ X °l 2 (8.82) 

(L/2) 2 ’ 

a result that is valid for any bounded p m as long as the error-truncation 
effects near m = ±1 are negligible. Thus the weak-noise m.mmax 
assertion may be proved by considering every other assignment of S(m) 
and showing that the resulting value of e* is larger than 2 NJL* when p m is 
chosen to be as disadvantageous as possible. 

If the stretch is not uniform, there is some reg.on along the locus over 
which S(m) is minimum. The constraint of Eq. 8.81 implies that this 
minimum value, say S min , must be less than L/2. If P m is chosen sc > that 
almost unit probability is concentrated in the region corresponding to 
minimum S(m), it follows from Eq. 8.74 that 


s* = P E[(m - mf | m = a] p m { a) da 


which completes the proof. An illustrative example is shown in Fig. 8.25. 
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The significance of this result is that, with weak enough noise, the mini- 
max mean-square error when m is mapped onto any given signal locus is 
independent of the shape of that locus under the assumption of maximum- 
likelihood reception. It is only the length of the signal curve that counts. 
Hence, a preference for one locus of a given length over another of the 
same length must depend entirely on other factors; for example, on which 
curve has the larger threshold or on which curve is easier to generate and 
receive. 



Figure 8.25 An example of nonuniform stretch and probability density that yields 
? > 2JfJL\ 

Maximum-likelihood receiver design. The design of a maximum- 
likelihood receiver for the transmission of a single bounded input param- 
eter with twisted modulation is at least conceptually straightforward. 
Given the signal locus and additive white Gaussian noise, when the 
received signal is p{t) we have 

p r (p |m = a)~ e -|p- s *| 2 / J A> 

~ P ■ s a — 

= P%(0s a (0^-££«, (8-84) 

J —00 

in which s a — representing s a (t) — is the transmitted signal and E a = |sj 2 is 
the transmitted energy when m = a. For a maximum likelihood decision 
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Figure 8.26 A discrete set of vectors {s a } that approximates the locus s m . 


the received signal must be correlated against each member of the entire 
set of transmitter signals — 1 < a < 1 ; if £« is independent of a, 

m is set equal to that value of a for which the correlation is maximum. 

The maximum-likelihood receiver for linear modulation is a special 
case of the- receiver implied by Eq. 8.84. With linear modulation sjt) = 
aA (p{t ), so that 

f p(t) sjt) dt - %E a = v.Ap - 2 A 2 , (8.85 a) 

J — CO 

in which 

p ±r p(t)<p(t)dt. (8.85b) 

J — CO 

As we have already noted, only a single correlator is needed. The receiver 
sets m equal to that value of a for which the right-hand side of Eq. 8.85a 
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is maximum. Differentiating, we have 

0 = Ap- v.A z \ a= ^, 

hence 

m = — , (8.85c) 

A 

as expected. 

! The implementation of a maximum-likelihood receiver for an arbitrary 

signal locus would involve an infinitum of correlators (of calculations), 
j one for each value of a in the interval [—1, +1]. In general, of course, 

such a receiver cannot be built. One realizable approximation results from 
j specifying a finite set of M points {aj equally spaced over [—1, +1] and 

correlating the received signal against each of the corresponding M 
signals {s’# (?)}• It is evident from Fig. 8.26 that the resulting performance 
degradation is negligible when M is chosen large enough that neighboring 
s a are approximately colinear and 

- <x, < jV?; i — 0, 1, ...» M — 2, (8.86) 

in which € 2 is the mean-square error with true maximum likelihood 
reception. 

In the next section we examine pulse-position modulation and observe, 
as in linear modulation, that the exact maximum-likelihood receiver can 
be realized by means of a single matched filter. 

Pulse-Position Modulation 

A common example of a noise-suppressing twisted modulation scheme 
is pulse-position modulation (abbreviated PPM). We first treat an 
idealized single-parameter communication system. As shown in Fig. 8.27, 
the bounded random input m causes an impulse of value to be 

generated at time mT 0 . This impulse excites the ideal lowpass filter W(f) 
of bandwidth W, The filter output is amplified and transmitted, so that 

sjt) = Vf s cp{t - mT 0 ); - 1 < m < + 1. (8.87a) 

Here 

¥>(» = (8.87b) 

2nWt 

is a unit-energy waveform similar to those encountered in the sampling 
theorem, except that the bandwidth is now W rather than W m . 

The form of the maximum-likelihood PPM receiver follows immediately 
from Eq. 8.84: since the transmitted energy is E s> independent of the value 




SrnW 



of m, we have 

pX p |m-«)~r p(r) <p(y - aT o) d y- (8 - 88) 

J—ao 

But the right-hand side of Eq. 8.88 is just the output at time t = »r 0 of a 
matched filter, with impulse response 

hit) = (8 ‘ 89) 

whose input is the received signal p{t). Since - <p(0, a maximum- 

likelihood receiver need only pass the received signal through another 
ideal bandlimiting filter W{f), determine the time instant f, 



Figure 8.28 Idealized maximum-likelihood PPM receiver. 
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at which the filter output is maximum, and set m = f/F 0 . Such a receiver 
is illustrated in Fig. 8.28. 

Weak-noise suppression. Even though the set of all {.s TO (0}> — 1 < 
m < 1, cannot be described in a finite-dimensional signal space, the 
arguments leading to Eqs. 8.79 remain valid. The mean-square error 
produced by maximum-likelihood reception in weak noise may therefore 
be determined by evaluating 


— co I dm 


s «(0 dt 


= E S T 0 2 \ [<p'(t - mT 0 )f dt (8.90) 

J — GO 

and showing that the right-hand side is independent of m. The prime 
notation in Eq. 8.90 means differentiation with respect to the argument. 

To evaluate S 2 in the time domain would be difficult, but it is easy to do 
so in the frequency domain. Since (p{t) is the response of the ideal filter 
W(f) to an impulse of value 1/V 2 W, the spectrum of <p(t) is 

®(/)=blf’ 1/1 <W (8.91) 

1.0; elsewhere. 

Differentiation in time corresponds to multiplication by j27r/in frequency. 
The spectrum of <p'(t) is therefore j27r/d>(/). From Parseval’s theorem, 


I VO - mT 0 )] 2 dt 


= f“ WV)f 

J — GO 


2WJ-W 3 


Substitution in Eq. 8.90 yields 


S 2 = ^(2ttT 0 W)\ (8.93) 

which is independent of m. It follows from Eq. 8.79b that 

72 ^ 12/ _J_Y (8.94) 

77 2 \4r 0 IF/ 2 E s 

when the noise is sufficiently weak that the probability of anomalous 
reception is negligible. For somewhat stronger noise, Eq. 8.94 can be 
interpreted as the conditional mean-square error, given that an anomaly 
does not occur. 
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The significance of the quantity 4 TffV is important. Recall that the signal 
bandwidth is W and that the maximum of the (infinite-duration) signal 
5 . (0 is positioned over an interval whose total width is 27V Thus the 
quantity 4 T 0 W is twice the product of the signal bandwidth and the 
“signaling interval.” In view of Appendices 5A and 8A, the quantity 
4 TqW in some sense represents the dimensionality of the signaling set,, 
even though an infinite number of dimensions is required to represent 
the infinite set of signals {s m (t)} exactly. We shall henceforth refer to 
twice the product of the signaling bandwidth and signaling interval as 


Figure 8.29 Suppression of weak noise. The output, k( 0, o f the receiver s matche 
filter is the sum of s m {t) and band-limited Gaussian noise. It is clear from the geome ry 
•that steepening the skirts of s m (t) reduces the displacement of t . 

the effective dimensionality of the signal set and denote it (3. For the PPM 

signals under consideration . 

S fi = 4 T 0 W. (8.95a) 

Equation 8.94 may then be rewritten in terms of 0 as 

- 2= 12J_J^o (8.95b) 

7T 2 /J 2 2E S 

With linear modulation, the mean-square error is N 0 j2E s . Thus PPM j 

achieves a weak-noise advantage over linear modulationf which increases j 

as the square of the effective dimensionality. The larger the product 
4 T 0 W, the greater the improvement, as long as an anomaly does not 

occur. . . 

Insight into the physical phenomena by means of which weak noise is 

suppressed can be gained from. Fig. 8.29: When the bandwidth W of 
<p(t) is much greater than 1/2 T 0 , the effective duration of <p(t) is short 
compared with 2 T 0 and the signal pulse at the matched-filter output has 
steep skirts. Weak noise at the output of the matched filter adds to the 

t The advantage enjoyed by PPM over linear modulation is decreased if the comparison 
is made on the basis of average rather than peak energy, for PPM_ £* - E„ whereas 
' for linear modulation, with m uniformly distributed over [—1, 1], E m — EJ 3. | 

| 





PULSE-POSITION MODULATION 627 



Figure 8.30 Anomalous reception in strong noise. 

signal output and causes the location of the maximum of the sum to be 
displaced slightly from the signal maximum. The greater the signal 
bandwidth, the steeper the skirts of the signal pulse and the smaller the 
mean-square value of the displacement. 

Threshold. Exact analysis of the threshold behavior with maximum- 
likelihood reception of PPM appears to be both difficult and unrewarding. 
It is possible, however, to make an approximate analysis that places the 
fundamental phenomenon clearly in evidence and agrees remarkably well 
with experimental measurements. In the arguments that follow we use 
m 0 to denote the actual value assumed by the transmitter input variable m. 

Consider Fig. 8.30, which depicts the output u(t) of the matched filter 
W(f) when the input is V E s rp(t — t 0 ) 4- n w (t). The signal component of 
u(t) attains its maximum at time t 0 = m 0 T 0 and weak noise usually causes 
the observed maximum to be shifted from t 0 by only a small amount. But 
strong noise may cause the maximum to occur far from t 0 and thereby 
introduce an anomalous error, as shown in the illustration. 

To calculate an approximation to the probability of an anomalous 
error, let us focus attention on the finite set of instants {/J, i an integer, 
defined by 


,< = h + m ; 


To ^ U ^ TJj. 


(8.96a) 


It is clear from Fig. 8.31 that for any t 0 in [— T 0 , r o ], hence for any m 0 
in the allowable range — I < m < 1, there are substantially 

(. 1 = 47^ (8.96b) 

such instants, in which (3 is the effective dimensionality of the signaling set. 


-To 




t-8 t~ 6 t~ 4 t- 2 <0 h 1 2 *3 U 

Figure 8.31 The {?,■} are constructed by marking off increments of length lj2W on 
both sides of t 0 . 




628 WAVEFORM COMMUNICATION 



(e) 

Figure 8.32 Possible waveforms at the matched filter output of an idealized PPM j 

receiver: («) signal component at matched filter output; (b), (c) anomalous outputs : 

included in the event B; (cl) anomalous output excluded from B; (e) nonanomalous 
output in B. 'I 
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Equation 8.96a and the properties of the sampling function p(t) guarantee 
that 


<p(t - ti) <p(t - tj) dt = 6 t 


for any pair of instants (t { , t } ) in the set {/ 2 }. Since the signals {<p(t — ? 2 -)} 
are orthogonal, the probability that u(t ) will be larger at one of the in- 
correct time instants in the set (fj than at t 0 is close to the probability of 
error in the communication of one of /? equally likely messages when 
using equal-energy orthogonal signals. Letting B denote the event 

B = [u{t s ) > n(t 0 ) for at least one t } in (tjj, 
we have from Eqs. 4.96a, 4.111, and 2.121 


P[B] ss 1 - 


00 _ 1 

-oo yJ-nN , o 




0 


(8.97) 


The bound is very tight under usual conditions of operation. 

It_ appears that the probability of error for the orthogonal signals 
(V E s (p(t ~ t } )} provides a reasonable approximation to what one wishes 
to mean by the probability of anomaly, say P|Vfc], in our idealized PPM 
system. There are two difficulties in making the statement more precise. 
By far the most important is the logical impossibility of dichotomizing the 
infinitum of all possible waveforms u(t) at the filter output in Fig. 8.28 
into disjoint subsets labeled “anomalous” and “nonanomalous.” As 
shown in Fig. 8.32, however, the event B does include most cases that we 
would reasonably call anomalous, although it also excludes some that 
obviously are anomalous and includes others that obviously are not. 

The second difficulty is illustrated in Fig. 8.33: it is evident that the 
Euclidean distance between \j E s <p(t — mT 0 ) and V E s p{t — t 0 ) is less than 
\/ 2 E s for certain values of mT 0 between the {/J. These intermediate values 
could conceivably contribute more to the probability of anomaly than do 
the orthogonal points. But this possibility evanesces when we consider 
actual PPM signals in which filter ringing is carefully minimized by design. 
For example, measurements have been performed 1 on a laboratory 
model of a maximum-likelihood PPM receiver which uses the approxi- 
mately Gaussian pulse shown in Fig. 8.34. The effective dimensionality /3 
was defined as 4r 0 A, with A taken as one half the width of the pulse 
between the — 10 db points, and an anomaly was said to occur whenever 
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(b) 

Figure 8.33 The signals <p(t — /„) and cp{t — mT 0 ) are not orthogonal for all m in 
[— 1, 1]. It is easy to prove that 



(pit) (pit - t) dt 




so that the worst case, illustrated in (a), occurs when mT 0 ™ t 0 + -tt. ■ The geometrical 
relation of the two signals in (a) is illustrated in (6). 



the receiver output m differed by more than ±A from the transmitted 
message m. The close experimental agreement between the relative 
frequency of anomalies and the equation 


PM;] * P[B] 


.-E s /2jV , (t 

W-» - h F/v , 

yj2irEJJ'C 0 


(8.98) 


is evident from Fig. 8.35. We conclude that Eq. 8.98 provides a good 
estimate of the anomalous behavior in white Gaussian noise of a maximum- 
likelihood PPM system using well-designed signals. 



Figure 8.35 The experimental points are the relative frequency of anomaly with PPM ; 
the solid curves are the true P[S] for /5 equally likely orthogonal signals. Equation 8.98 
results when the true P[£] is approximated by the union bound; the approximation 
improves as P[S] decreases, as indicated by the dashed line for /9 = 15. 
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E'/Xq 

Figure 8.36 Graphical solution for E,fX 9 and /? for PPM. 

The estimate of P|>t] in Eq. 8.98, together with the mean-square error 
result of Eq. 8.95, permits the design of efficient PPM systems. Once the 
maximum tolerable (threshold) value of P[7fc] is specified, the value of 
EJN o required by Eq. 8.98 may be plotted as a function of 0. A similar 
plot can be made from Eq. 8.95b for the maximum tolerable e 2 in the 
absence of anomalies. As shown in Fig. 8.36, the intersection of these two 
plots yields the approximate minimum-acceptable value of EJ< A p 0 and 
the corresponding required value of /5. 

A one-parameter characterization of the performance of continuous 
PPM may be obtained by combining the mean-square error in the absence 
of anomalous errors with the mean-square error contributed by the 
anomalies themselves. When an anomaly occurs, m is substantially 


equally likely to be anywhere in the interval [— 1, 1], regardless of the 
value of m. (fi mutually orthogonal signals are completely symmetric.) 
Furthermore, the event “anomaly” is independent of m. Thus 


E4(m - mf] = E 4m 2 ] + E A[m 2 ] 

= + (2 / ** da ) = ^ + 3 ’ ( 8 - 99 ) 

in which E^[ ] denotes expectation conditioned on the occurrence of an 
anomaly. If we asume that m is uniformly distributed over [—1,1] and 
denote the combined mean-square error by Eqs. 8.95 and 8.98 yieldf 


e T 


2 




€ 2 (l-P|>t]) + fPUt] 
12/l\ 8 Jf 0 2 

7 Ap) 2E S 3 


e -K s !2X 0 


( 8 . 100 ) 


Equation 8.100 may be plotted in terms of the output signal-to-noise 
ratio, defined in accordance with Eq. 8.21a as m 2 /^ 2 . Typical plots are 
given in Fig. 8.37. The knee of the output signal-to-noise ratio curve is 
referred to as the “threshold region.” 

The use of e T 2 as a single-parameter characterization of performance 
requires a certain amount of caution: the statistical properties and the 
effect of anomalies and of additive Gaussian noise are very different. In 
particular, if € r 2 truly measured user satisfaction, it would be good 
engineering practice to minimize 7^ 2 for a fixed value of EjJf 0 by proper 
choice of /?. Such a choice leads to a value of P[>fc] which is too large to be 
tolerable in most applications (such as speech). Thus a design procedure 
utilizing e 2 and P[yfc] separately seems preferable. 

Discussion. With idealized PPM, the increased stretch necessary for 
weak-noise suppression is obtained by twisting the signal locus more or 
less onto a ^-dimensional sphere with fixed radius \J E s . The resulting locus 
may be visualized roughly as indicated in Fig. 8.38: the signal curve is 
wound onto the sphere very loosely, with “coils” looping from one 
orthogonal axis to the next without ever coming into proximity. It is 
because of this that the probability of anomaly can be estimated by 
examining only /? orthogonal points along the locus. 

Of course, it is not true that P[yt] can be approximated in this way when 
the signal locus is arbitrary. Indeed, the approximation is not even valid 
for PPM when the transmitting waveshape is not judiciously designed. 

f Equation 8.100 is written as an approximation primarily because it neglects the effect 
of higher order terms in the Taylor series approximation of Eq. S.76. See Problem 8-15. 
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Figure 8.37 Signal-to-noise ratio of ordinary PPM when m is uniformly distributed. 

As an example, consider position modulating an arbitrary unit-energy 

pulse, say x(t), so that 

S J.t) = jE,x(t-mT,). (8.101a) 

The weak-noise performance is again determined by the stretch 

S-.f 

J-co dm 


= jE s T 0 2 P * [x'(t ~ mT 0 )f dt 
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Figure 8.38 Approximate projection ofa portion of the idealized PPM locus onto a 
three-dimensional sphere of radius Ve 3 . For mT n = j/2W, we have $„, = Vr, <p it 
j — i—1, U l + 1. The looping of the locus for intermediate values of m reflects 
the nonorthogonality evidenced in Fig. 8.33. 



in which the prime denotes differentiation with respect to the argument. 
If X(J ) is the Fourier transform of *(/)» then j27r/ X(J) is the transform of 
*'(*)• It follows from Parseval’s theorem that 


S ^E,T 0 f\^fXU)Uf 

J— CO 

= y(2*4 V x T a f, (8.101b) 

in which we have introduced the definition 


M , I s = 3f"/ 2 |X(/)| 2 d/. (8.101c) 

J — 00 

The factor 3 in Eq. 8.101c normalizes the definition so that W x — W when 

X(f) is the ideal normalized rectangular filter function 

In terms of W x , the mean-square error in the absence of anomaly is 


r»_i2/ i \ 2 Jn 
AaW x tJ 2 E s ' 


(8.102) 


Thus the weak-noise performances afforded by any two PPM systems for 
which W x is the same are equivalent. 
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In order to see that this equivalence does not extend to the probability . 
of anomaly, we need consider only the situation in which x(t) is specialized 
to 

14 = sin 2-nU ; -A < t < A, / 0 A an integer 

*(0= V A (8 - 103a) 

1,0 ; elsewhere. 

It can be verified that f [a?'(03 2 dt = (2tt/o) 2 > so 

v — CO 

W x = Su (8.103b) 

When / 0 is chosen as {ljS)W the mean-square error in weak noise 
afforded by x(t) is therefore the same as that afforded by position-modu- 
lating (p{t ), the impulse response of an ideal normalized rectangular filter 
of bandwidth W. This is true regardless of the duration, A, of x{t). 

On the other hand, the probability of anomaly with *(0 depends 
critically on A, a fact that is clarified by Fig. 8.39, in which we plot the 
output of the filter matched to x{t ): when A is large compared with l// 0 , 
this output, which is just the correlation function of x(t), has many local 
maxima of slowly decreasing amplitude. It is evident that these local 



Figure 8.39 A signal whose correlation function is sensitive to anomalies. . 
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K(t) 


Figure 8.40 A signal correlation function for which the probability of anomaly may 
be estimated by means of Eq. 8.98. 

maxima imply an inordinately high probability of anomaly when noise is 
present. Moreover, if we attempt to decrease e 2 by choosing / 0 > ( 1 / V3) W, 
the number of local maxima, hence P|>f], becomes even larger. 

Estimating P[jFJ by counting the number of essentially orthogonal pulse 
positions at the output of the receiver’s matched filter is valid if and only if 
the correlation function of a;(/) is relatively “compact,” as in Fig. 8.40. 
The equivalent geometrical requirement is that the signal locus be loosely 
coiled. This condition must be satisfied if an unwarranted susceptibility 
to anomalies is to be avoided. 

Antipodal PPM. It is possible to increase stretch without changing 
the effective number of dimensions occupied by a PPM signal — and 
without violating the loosely coiled condition — by adopting the antipodal 
signaling set 

{-y[E s <p{t-2\m\T Q )- -1 <m<0 

sjf) = _ (8-104) 

(W£ S ^-2NT C ); 0<m<l. 

The effect of dividing the range of m into two parts is to increase the 
stretch by 2, which reduces by the factor We have 

(8.105a) 


(8.105b) 

is again the effective dimensionality of the signal space. 

The effect of antipodal PPM on the probability of anomaly is evident 
from the structure of the maximum-likelihood receiver. As illustrated in 
Fig. 8.41, such a receiver first determines the instant t within the interval 


e 2 = W— T — ; for antipodal PPM, 
ttA2B! 2E s 


(5 = 4 T 0 W 
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Figure 8.41 Maximum likelihood receiver for antipodal PPM. (The matched filter is 
assumed to have zero delay.) 


[0, 2 r 0 ] for which the magnitude of the matched-filter output is greatest 
and then sets 


in accordance with a positive or negative filter output at t — f. 

The considerations that led us to estimate the probability of anomaly in 
terms of the probability of error for ft discrete orthogonal signals are still 
germane. Now, however, it is apparent from Fig. 8.41 that these same 
arguments lead us to estimate P|>fc] in terms of the probability of error for 
2 0 equally likely biorthogonal signals, each with energy E s . From Eqs. 
4.112 and 2.121 we have 

-E 1 /2J'P 0 

PI A\ ph (2 8 — 1) = ; for antipodal PPM. (8.105c) 

■ JiirEJJTo 


We note from Eqs. 8.105a and c that antipodal PPM with effective 
dimensionality 0 is equivalent, both with regard to <= 2 and Pfyt], to ordinary 
PPM with effective dimensionality 2fi. 
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Although the weak-noise performance of antipodal PPM is 6 db 
superior to that of ordinary PPM for the same value of ft, it is not neces- 
sarily easy to attain this improvement in practice.' The reason, of course, 
is that actual systems use bandpass rather than lowpass signals. The 
usual technique is to use DSB-SC to heterodyne from baseband to RF and 
then back down again. Antipodal PPM then requires an accurate receiver 
phase reference, whereas ordinary PPM is not particularly sensitive to RF 
phase. Indeed, an ordinary PPM receiver may consist of a bandpass 
matched filter, followed by an envelope detector, without material 
degradation of performance when E s jJf 0 » 1. 

Waveform communication with PPM. We next consider the efficiency 
of PPM in communicating an ideally band-limited random process 

CO 

w (0 %(0 (8.106) 

with lowpass bandwidth W m . For convenience, we assume that the 
process has been so normalized that \m k \ < 1 for all k. 

With PPM each m k is transmitted and estimated by a (maximum- 
likelihood) receiver in succession. The receiver then constructs 

CO 

m{t) = 2 > f h %(0- (8.107) 

fc=— CO 

In the absence of anomaly, we have 

= m k + n k ; for all k, (8.108) 

in which each n k is a statistically independent zero-mean Gaussian random 
variable with variance e 2 given by Eq. 8.95b. It follows from Eq. 8.50 that 
m(t) = m{t) + n(t), (8.109a) 

in which 


\ 

«(0 = 2 n H v >*(0 

Tc — — co 

is. a stationary Gaussian process with power density 

K(f) = P ; 1/1 < w " 

0; elsewhere. 


(8.109b) 

(8.109c) 


The average power of the additive noise for ordinary PPM in weak noise is 
therefore 



for ordinary PPM, 


( 8 . 110 ) 
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whereas with linear modulation we found n z (t) = Increasing 

the effective dimensionality, 0, yields weak-noise suppression without an 
increase in the peak transmitted energy per sample. 

The value of 0 in Eq. 8.110 is related to the transmission bandwidth W 
and the modulating bandwidth W m : the time available for the trans- 
mission of each m k is the sampling interval, 1/2 W m . The maximum 
allowable position deviation for each transmission is therefore con- 
strained by 

^k- .. (8U1) 

If, as shown in Fig. 8.42, a nominal interval of 1/2 IF sec is included at both 
ends of every sampling interval to guard against overlap of transmitted 
pulses into adjacent intervals, we have 

— = 2 (^ +2T ° 


A W 

’’ m 

We see that the effective dimensionality is related to the bandwidth ex- 
pansion ratio, Wj W m , by 

* B + 2. (8.H2) 

W m 

When PPM is used in conjunction with DSB-SC modulation, W is 
identified as one half the RF bandwidth. 

As already noted, it is possible by means of antipodal PPM to retain a 
fixed bandwidth expansion ratio WjW m while achieving the performance 
that ordinary PPM achieves only when the effective dimensionality is 
28 = 2(WjW — 2). The resulting increase in the efficiency oi channel 
spectrum utilization may be desirable when E s » JT 0 if the allowable band- 
width is limited and an RF phase reference can be made available. It may 
even be desirable to compound antipodal signaling with quadrature 
DSB-SC multiplexing; that is, to transmit 

( + > /2E S (p(t - 4 \m + || T 0 ) cos a> 0 t; - 1 < m < -| 


sJt) = { , 


- \llE s (pit - 4 \m\ T 0 ) cos a) 0 t; 

+ -JlE s cp{t - 4 \m\ T 0 ) sin co 0 f, 

- s/2E s (pit - 4 | m - I! T 0 ) sin co 0 t; 


— | < m < 0 


0 < m < | 
A < m < 1- 


(8.113a) 
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Such a quadrature-antipodal system multiplies the effective dimension- 
ality of ordinary PPM by 4, still without increasing the RF bandwidth 
required for transmission. The values of e 8 and P|Vfc] are then given by Eqs. 
8.95b and 8.98 with 

(8 - n3b) 


Frequency-Position Modulation 

A second example of a single-input-parameter twisted modulation 
scheme is frequency-position modulation (abbreviated FPM). With 
ordinary FPM, the transmitted signal is a 2T-sec pulse of sine wave, the 
frequency of which is determined by the transmitter input. We take 

U0 = Vi^„W (8.114a) 

with 

. cos 2 tt(/ 0 + mW 0 )t; -T < t < T 

(8.114b) 

(0; elsewhere. 

The factor 1 /Vt normalizes cpjt) so that the transmitted energy is 
essentially equal to E s when / 0 » W 0 , which is the case of interest. We 
assume that the random variable m is confined to [ — 1, +1]. 

As the name implies, FPM is the frequency-domain equivalent of 
idealized PPM. The equivalence is obvious when Fig. 8.43 is compared 
with Fig. 8.27. It follows that the stretch factor for FPM is 

S 8 = ^(27rTW 0 ) 2 , (8.115) 


a conclusion that may be readily verifiedf from Eqs. 8.79a and 8.114. In 
the absence of anomaly, the mean-square error with maximum-likelihood 
reception is therefore again 


72 ^ 1111 ]^ 

7r 8 \/5 2 /2E s ’ 


(8.116a) 


in which the effective dimensionality of the signal set isj 


jS = 4 TW 0 ; for FPM. (8.116b) 


f In Eq. 8.115 we neglect the double-frequency term in evaluating the integral of Eq. 
8.79a. 

$ The effective dimensionality is the product of effective bandwidth 2 fV 0 and the signal 
duration 2T. The usual factor 2 multiplying the time-bandwidth product is omitted, 
since the quadrature components are not used. 
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Just as with idealized PPM, the (sin x)jx shape of the transmitted pulse 
spectrum introduces some error info this estimate of PJyfc], but the error 
can be substantially eliminated by appropriate pulse-envelope wave- 
shaping. 

When FPM is used to communicate a bandlimited process m(t), we 
can allow only a time 



(8.117a) 


for each transmission, which implies 


| 

I 


w 

0 = (8.117b) 

\ 



Figure 8.44 Frequency guard bands with FPM. The RF half-bandwidth occupied by 
the signal is approximately equal to W. 


When m(t) comprises an infinite succession of samples, the mean-square 
value of the output noise from a maximum-likelihood receiver in the 
absence of anomalies is therefore 


nHt) = L 2 M«i . 
AwJ e s ’ 


for FPM. 


(8.118) 


The modulating bandwidth and the transmission bandwidth are related 
to the effective dimensionality with FPM in the same way as with PPM. 
The zero crossings of the (sin x)jz spectrum of each FPM pulse are 
spaced 1/2 T = 2W m cps apart. If we allow a nominal guard band of 
2 W M cps on each side of the frequency interval [f 0 W 0 , f 0 -F W 0 ], as 

indicated in Fig. 8.44, the half-bandwidth required for transmission is 


W = W 0 + 2W m = WJfi + 2). (8.119) 

Thus the bandwidth expansion factor WlW m again equals (/3 4- 2). 
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A major distinction between FPM and PPM is the difference in ease of 
maximum-likelihood receiver instrumentation. The frequency-domain 
equivalent of the PPM receiver of Fig. 8.28 is a device that takes the 
Fourier cosine transform of the relevant received signal over the time 
interval [— T, T], - determines the frequency / in the range [J 0 w 0 , 

y o _j_ }y Q ] for which the transform is maximum, and sets 


- f-/o 
m — — 


( 8 . 120 ) 


Clearly, such a device is more difficult to build than the PPM receiver. An 
approximation to the FPM maximum-likelihood receiver, can, of course, 
be constructed as indicated in Fig. 8.26.f 

The equivalence of FPM and PPM-including the P h “ e reference 
problem — extends also to the possibility of using antipodalTTM signals: 
the resulting performance is equivalent to that of ordinary FPM with the 
effective dimensionality doubled. Furthermore, quadrature multiplexing 
may be used to redouble the effective dimensionality, again without 

increasing the transmission bandwidth. . . , 

A modulation scheme closely related to antipodal FPM is obtained 
when m is used to modulate both the phase and the frequency of the RF 


\J~~ cos MU + *nW 0 (t + T )i 1 — T < t < T 


( 8 . 121 ) 


(o; elsewhere. 

The stretch factor (with double-frequency terms neglected) is 


S 2 = - s (2tt1F 0 ) 2 ^ { T (t + Tfdt 
T 2 J-T 


= ^(4t tTW 0 )\‘ 

3 

so that, with 0 equal to 4 TW 0 as before. 


(8.122a) 


At ) = 


12/iyjw. 


77 2 \ wj E s 


(8.122b) 


f Darlington 20 has proposed that radar pulse-compression techniques may be useful 
in converting the received FPM signal into PPM format. 
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For the waveforms of Eq. 8.121, orthogonal signals result when m increases 
by 1//3 = \jATW 0 , so that 2/5 values of m within the interval [—1, +1] 
lead to mutually orthogonal signals. It follows that we again have 


?[A] ** (20 - 1) 


, — B S /2JV > 0 
I2ttE s IM\' 


(8.122c) 


With maximum-likelihood reception, both e 2 and PjVt] are equivalent to 
the performance obtained with antipodal FPM. 

When a sequence of signals, each having the form of Eq. 8.121, is used 
to communicate a modulating process m(t), the terminal phase of each 
transmission in turn can be chosen as the initial phase of the succeeding 
one. Frequency modulation (abbreviated FM), to which we next direct 
our attention, is a scheme closely akin to this. In particular, the phase con- 
tinuity of an FM signal enables FM receivers to combat slow drifts in 
phase. 


8.3 FREQUENCY MODULATION 

With PPM and FPM, the modulating process m(t) is sampled once each 
\j2W m sec, and the transmitter and receiver must operate synchronously 
in time. On the other hand, with FM the input waveform modulates the 
transmitted signal continuously and timing problems do not exist. 

An FM signal may be written in the form 

s m (t) = Ayj2 cos 2 tt U+W x ^m(t)dt . (8.123) 

Here the factor V 2 is chosen to normalize the transmitted power to A 2 . In 
contradistinction, for PPM and FPM we normalized the signals so that E s 
was the transmitted energy per input sample. For modulating processes 
with rectangular bandwidth W m the two normalizations are equivalent 
when 

A 2 = 2 W m E s . (8.124) 


The argument 0(t) in any signal having the form cos d(t ) is called the 
instantaneous phase, and (1/2*) ^ 0(0 is called the instantaneous fre- 
quency, denoted f T . Frequency modulation takes its name from the fact 
that fj depends linearly on m{t). From Eq. 8.123 we have 

/i = tU ( + wAmtfdt 

at L v -l 


= /o + w i m (0- 


(8.125a) 
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If we normalize m(t) so that 


|m(01<l; for all t, (8.125b) 

then 

h - W l <fx </« + (8.125c) 

and W x is the maximum instantaneous-frequency deviation. 

The modulation index of an FM signal is defined as the ratio of the 
maximum instantaneous-frequency deviation to the bandwidth of m(t). 
The modulation index plays the same role in FM as effective dimen- 
sionality in PPM and FPM. For this reason we also denote it p : 

#4-^; for FM, (8.126) 

Wn 

when \m(t ) | < 1. 

Signal Bandwidth 

We now show that the half-bandwidth, again denoted W, required to 
pass most of the energy of an FM signal is approximately related to W m 
and p by the familiar equation 

= (P + 2). (8.127) 

An instructive, albeit imprecise, justification for the relationship follows 
from quasi-static extension of arguments already encountered in con- 
nection with FPM. The rate of change of f z in an FM signal is essentially 
controlled by the bandwidth of m(t); over any interval that is short 
compared with l/2JF TO ,/ 7 will be more or less constant. But we have 
observed in Fig. 8.43 that the spectrum of a 1/2^-sec pulse of sinusoid 
varies as (sin x)jx, with zero crossings 2 W m cps apart. When P » 1, we may 
visualize the spectrum requirements for FM in terms of a similar (sin x)jx 
pulse that moves quasi-statically within the band [f 0 — W x , / 0 + W x ] 
under the control of m(t). Allowing a nominal guard band of 2W m cps on 
either side of the range of /, leads to Eq. 8.127. 

An exact expression for the spectrum of an FM signal can be obtained 
when m(t) is a sinusoid. Letting m(t) = cos 2tt W m t, a> 0 = 2i r/^, and 
= 2rrW m , we have 

s m (t) = Afi cos (<o 0 t + sin m m t\ 

= AV2[cos (P sin o> m t) cos co 0 t - sin (/ 3 sin co m t) sin o> 0 t]. (8.128) 
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But for any integer k 


— giWsina-ia] da 4 

2 Tt J-JT 


(8.129) 


in which J k ((3) is the /cth-order Bessel functionf of the first kind. 83 It may 
be verified from Eq. 8.129 that 


— cos (P sin a) cos ka da. = 
2n J—tt 


UP), 

k even 

.0, 

k odd, 

'o. 

k even 

MP), 

k odd. 

COS k(D m t 

] 


— sin (P sin a) sin ka da = ( 

2 TT J-rr | 

Thus the Fourier series expansion of sff) is 


— Asj2 sin <o 0 /F2 ^ J k (P) sin ko> m t~ I; ^>0. (8.130) 

L fcodd J 

With cosinusoidal modulation at frequency W m , the spectrum of s m (t) 
contains an infinite number of discrete terms at frequencies / 0 ± kW m , 
k = 0, 1, 2 

Bessel functions appear frequently in mathematical physics and have 
been extensively tabulated. 46 Representative examples are plotted against 
P in Fig. 8.45. For k> p > 2 it can be shown from the series expansion 50 
of J k (P) that 


i /Mft+w 2) 


(8.131a) 


Introducing Stirling’s approximation to the factorial into Eq. 8.131a, we 
note in particular that 

i r rn\i (8 . 13 lb) 


\W)\ < 


\j2irk > 2 kl 


which goes rapidly to zero as k increases beyond (e/3/2). 

We conclude from Eq. 8.131b that only a finite number of terms in 
Eq. 8.130 actually contribute significantly to sjj) and that this number 
grows linearly with p. But we must also conclude that the first sideband 
components at f 0 ± W m are significant even when p « 1 ; neglecting them 
would mean approximating 5 m (t) by a pure sinusoid, without any modu- 
lation at all! Both conclusions are consistent with Eq. 8.127 and are 

t The function J a (x) encountered in Eq. 7.49 is equal to / 0 (f)- 
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Figure 8.46 Relative magnitude of spectral components at f, ± kW m as a function of 
k for FM with cosinusoidal modulation. 

further substantiated by Fig. 8.46, in which we plot the spectrum of 
s m (t) for cosinusoidal modulation with various values of modulation index. 
Equation 8.127 is especially accurate when 0 » 1. For (1 « 1 a half-band- 
width of W m cps would suffice. 

Weak-Noise Suppression 

We now consider the weak-noise suppression afforded by frequency 
modulation. Our first step is to analyze the mean-square noise produced 
at the output of an idealized conventional FM receiver. We then investigate 
pre-emphasis FM, which is also called phase modulation or PM, and show 
that conventional receivers perform as well as maximum-likelihood re- 
ceivers when the noise is sufficiently weak. We conclude by comparing 
the weak-noise performance of FM and PM with that of FPM and PPM. 
The probability of anomaly is studied in the next subsection. 
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Conventional FM receivers. An idealized conventional FM receiver is 
diagrammed in Fig. 8.47. The received signal is first passed through a 
rectangular unit-gain filter H x {f) centered on the transmitted carrier fre- 
quency U The output from H x (f) is then heterodyned into the IF filter 
H 2 (f), which is also rectangular with unit gain. Both H x (f) and H 2 (f) have 
half-bandwidth W=W X + 2 W m , so that we may presume the input to 






Figure 8.47 Idealized conventional FM receiver: 
s m (t ) = AV2 cos 2n[f 0 t + W v f 


the limiter-discriminator in the absence of noise to be a relatively un- 
distorted replica of the transmitted signal, except for amplitude scaling and 
frequency translation. In the noiseless case we therefore assume 

r 3 (t) = A cos 2 tt f 2 t + FKi J m(t) dt . (8.132) 


Frequency demodulation is accomplished in the limiter-discriminator, 
which is a nonlinear device that responds only to variations in instan- 
taneous frequency. 2,71 We model the device mathematically by saying 
that whenever the input to the device is 



! 


(8.133b) 
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But we have seen that the spectrum of the derivative of a signal with 
Fourier transform X(f) is \2irf X(f). For purposes of analysis we may 
therefore replace the limiter-discriminator by a box that extracts 
from r s (t), followed by a “differentiating filter” with transfer function 

H\f) 4 j V, (8.134) 

as shown in Fig. 8.48. 

The final stages of an idealized conventional FM receiver comprise an 
ideal unit-gain filter W m {f) that eliminates noise outside the band 



H'(f) 


Figure 8.48 Idealized functional representation of limiter-discriminator. 


[— W m ] occupied by m{t) and an attenuator with gain G. In the 
noiseless case 

Ut) = 4 [2t tW 1 I m(0 dt] = 2^ m(t), (8T35a) 

dt L J 

so that we obtain m(t) = m{t) by setting 

G = — . (8.135b) 

2t tW x V 

Weak-noise effects with conventional receivers. We now make a quasi- 
static analysis of the mean-square output noise, n 2 (t), obtained with con- 
ventional FM reception in weak additive white Gaussian noise. We 
begin by assuming that the modulating signal is some constant, say 

m(t) = m T ; — 1 < m 1 < 1, (8.136a) 

so that the RF filter output in Fig. 8.47 is 

r x {t) = A\j2 cos 2 tt(/ 0 + W x m^)t + n x (t). (8.136b) 

Here n x (t) is a bandpass Gaussian process with power density 


KW == 



fo~W<]f\<f 0 +W 

elsewhere. 


(8.137) 


We next resolve n x (t) into two components, one in and one out of 
phase with cos oj 7 t, where 

<0/- 277/ z =27T(/ G + W imz ). 


(8.138) 
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«i (/) = nSf)\]2 cos a> x t + n s (t)^2 sin co x t. (8.139) 

In accordance with Eq. 7.24b, «„( t) and nlO are stationary lowpass 
Gaussian processes with power spectra given by the even part of b n (/ J /)- 
The power density spectra S J/) and S„.</) are therefore as plotted rn 
Fig. 8.49. Note that the restriction Imjl < 1 guarantees that b n {J) ana 
(J) equal J^ 0 /2 for all / within the restricted interval [— 21E m , 2fr, n J. 



-h -fo 




h h I 


5n e (f)=S ni (f) 


— ~ |W-|nij|Wi W + lm/(Wi 

Figure 8.49 Noise power density spectra. Because W = ^ + 2IF m and H < 1> 
IF - \m,\ W x > 2JF m . 

Substituting Eq. 8.139 in Eq. 8.136b yields 

r x (t) = [A + n 0 (t)]y/2 cos a>jt + «,( OV 2 sin (8.140a) 

Introducing the polar transformation diagrammed in Fig. 8.50, we rewrite 
this equation as 

r t (0 = a(t)s]2 cos [o>ji + <£(0L (8.140b) 


in which 


aft) = V[4 + »o(03 2 + "AO 

. , ^ A . —i 

<4(0 = tan 7T • 

^ + n e (0 


(8.140c) 

(8.140d) 


It follows that the input to the limiter-discriminator in Fig. 8.47 is 

r&) = aft ) cos [2 rr(f 2 + + KOI- ( 8J41 > 
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Next, we introduce the weak-noise assumption that the total noise 
power in the over-all signal bandwidth is much less than the signal power: 

2X 0 W«A 2 . (8.142) 

Under this condition, the approximation 


— tan 1 


— w »(0 
A + n c (t) 


~n s (t) ^ -n s (t) 
A + n c (t) A 


(8.143) 


is valid except for certain improbable (hence infrequent) time intervals 
during which |« S (0I and/or |« C (0I assume values much larger than usual. 



Figure 8.50 Polar transformation for substitution in the trigonometric identity 
cos (a + p) = cos a cos |8 — sin a sin ft. 


Excluding such intervals from consideration, we have 

r 3 (t) aft) cos w 2 t -f 2nW 1 m I t — (8.144) 

The final step in our analysis is to recall that the output of the limiter- 
discriminator, r 4 (t), may be identified with the output of a differentiating 
filter H'(f) = j2ij / whose input is {2-TrW 1 m 1 t — n s {t)jA]. Since the re- 
ceiver output is obtained from r 4 ft) by attenuation and lowpass filtering, 
neglecting the approximation involved in Eq. 8.144 yields 

m(t ) m l + n(t), (8.145a) 

in which n(t) is a stationary Gaussian noise whose power density function, 

s „(/) = (^J §„,(/) |if'(/)| 2 \WM)\\ (8.145b) 

is plotted in Fig. 8.51. From Eqs. 8.134 and 8.135b 

^=r s »</) d f 

J — CO 


2 irW x AJ J-w„ 2 
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Figure 8.51 Noise power density speetrum at output of FM receiver with weak noise 
and constant modulation. 


In terms of the energy transmitted per sample of m(t), it follows from 
Eq. 8.124 thaj the mean-square effect of weak white noise on the output 
of a conventional FM receiver is 



11^2 
3j3 2 222/ 


(8.146) 


Thus far we have considered m(t) to be a constant, much as in the 
FPM case The analysis can be extended to situations in which m{t) is 
not a constant, provided that fi » 1. As shown in Fig. 8.52 we can then 
approximate m(t) by a succession of rectangular pulses of duration A, 
where A can be chosen short compared with 1/2 W m and long compared 
with 1/2 W x . Thus it is possible to make the approximation to m(t) a good 
one and simultaneously to retain the essential validity of the foregoing 
(static) weak-noise analysis. Within this quasi-static approximation, 
the output noise is therefore given by Eq. 8.146 lor large /? and arbitrary 
(bounded) m(t). It is found experimentally that Eq. 8.146 remains approxi- 
mately correct even for moderate values of /?. 
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Pre-emphasis (W 2 ) 

filter 


Figure 8.53 Pre-emphasis FM, or PM, transmitter. 

Phase modulation. Until now we have been concerned only with the 
total mean-square noise as a criterion of communication system perform- 
ance in the absence of anomaly. As a practical matter, however, the 
distribution of noise power as a function of frequency may also be 
important. For example, a difficulty with FM is that the noise power 
density at the receiver output is not uniform; we have seen in Fig. 8.51 
that §„(/) varies as / 2 for - W m <f < W m . Thus the high-frequency 
components of m(t), which are particularly important to articulation when 
m(t) represents a voice signal, are received with less fidelity than the low- 
frequency components. 

This difficulty is overcome in practice by passing m(t) through a linear 
filter that pre-emphasizes high frequencies before modulation. Such a 
transmitter is shown in Fig. 8.53, in which the pre-emphasis is accom- 
plished by means of a differentiating filter In this case the trans- 

mitted process is 

s m (t) = 4/2 cos 2 TT f 0 t + dt^ 

= 4/2 cos 27r[/ 0 f + W* m(f)]. (8.147) 

In Eq. 8.147 it is the instantaneous phase 2ttW 2 m(t ) of s m {t) that is linearly 
related to m{t ); as a consequence such a signal is called phase modulated 
(PM). 

A PM signal can be received by means of a conventional FM receiver 
and a de-emphasizing filter, as indicated in Fig. 8.54. Consider the FM 



(Figure 8.47) 

Figure 8.54 Receiver for pre-emphasis FM. 
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receiver embedded in the figure and set the attenuator gain to 


(8.148) 


When the RF and IF bandwidths are large enough to pass s m (t) with 
negligible distortion, the FM receiver output in the absence of anomaly is 
m \t) + n*(t), where n*(t) is a Gaussian noise process and (m accordance 

with Eq. 8.145b) 

§„*(/)== \2 ttAWJ 2 (8-149) 

(0; elsewhere. 


The output of the de-emphasis (or integrating) filter in Fig. 8.54 is 

therefore m = m + m ( 8 - 150a > 

in which 


= IYIttAWJ 2 

(o; elsewhere. 


5 ; -W m <f<W,, 


(8.150b) 


We note that n(t) has uniform power density over the modulation band- 
width as was desired. The mean-square value of the final noise output 
(with weak noise input) is 


( ) (277 A W 2 f ' 


(8.151a) 


In terms of the energy transmitted per sample of m(t), Eq. 8.151a is 


" 2(0 XhrWj 2E S ' 


(8.151b) 


Maximum-likelihood reception. Thus far we have considered 
conventional examples of FM receiving techniques but have not deter- 
mined whether other methods of reception would be superior 
show that the weak-noise performance of the pre-emphasis receiver of 
Fig 8 54 is, in fact, as good as that obtainable with maximum-likelihood 
reception. On the other hand, it does m* follow that 

are equally ideal in regard to the incidence of anomalies. Indeed, m the 
sequel we shall see that they are not. 
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The weak-noise performance of a maximum-likelihood receiver when 
s m (t) = As! 2 cos 2 t r(f 0 t + W 2 m(t)\ (8.1 52a) 

may be determined by again representing m(t) by a vector, 
m = 2 , . . . , m K , 2 y 

For m(t) bandlimited to [— W m , W m ], we have 

_ P 2£7 2 

Sm(0 = Ay/2 cos 2 tt f 0 t + W 2 2 m k %(0 * (8.152b) 

L ft— X7 2 

where the {y> k (t)} are the sampling functions of Eq. 8.43. [We assume 
initially that only K 4- 1 samples of m(t) can be nonzero.] We then have 


= [-2ttAW 2 y/OV 2 sin 2n /o* + W 2 ^ m k y k (t) ; 
om 3 - L ft J 


K 

J = ”7 


(8.153a) 


Hence 


f * ds m {t) ds m (t) dt = 

J-co dm } dm t 

J (2t tAW 2 Y v»/0 v»i( 0 1 “ cos A-W 2 J^m k %(/)) dt. 

(8.153b) 

Since the are orthonormal, neglecting the high-frequency term 

yields 

f * dt = (27 rAW 2 f 8 n , for all j and /. (8.153c) 

J - co om r dm t 

Equation 8.153c may be interpreted geometrically as indicated in 
Fig. 8.55. In the vicinity of any particular signal vector, say s m = s 0 , 
small changes in m j and m l cause s ra to move in orthogonal directions when 
j it l. But with weak enough noise it is only the signal locus, in this case a 
( K + l)-dimensional surface, in the immediate vicinity of the transmitted 
signal that enters into the determination of a maximum-likelihood re- 
ceiver’s estimate of m. Furthermore, with white Gaussian noise, only the 
noise projection onto the signal locus is relevant and noise projections 
along orthonormal axes are statistically independent zero-mean Gaussian 
random variables with variance Jf 0 (2. Thus in weak noise the geometrical 
relations are locally equivalent to those previously encountered in the 
sequence-of-parameters communication problem of Section 8.1: only 
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Figure 8.55 Geometrical interpretation of Eq. 8.153c. 


the set of orthonormal functions, which are obtained by normalizing the 
right-hand side of Eq. 8.153a to unit energy, is different. The identity of 
these orthonormal functions depends on m; their orthogonality does not. 

It follows immediately from Fig. 8.55 that the weak-noise mean- 
square error in a maximum-likelihood estimate of each of the {m k } is 
,H\j2 divided by the square of the stretch factor in the direction specified 
by dsjt)ldm k . But the stretch is the same for all k: from Eq. 8.153c 

S 8 = (2ttAW 2 )\ (8.154) 


Thus a maximum-likelihood receiver in weak noise produces 


m(t) — m(f) + 2 n k %(*)> (8.155a) 

k — K/2 

in which the {%} are zero-mean Gaussian random variables with 

= y«P- . (8.155b) 

* (2t tAW 2 ) 2 

Letting K go to infinity and invoking Eq. 8.50 yields 


with 


m(t ) = m(t) + n(f) 


(8.156a) 



(277/1 W 2 f ' 


(8.156b) 


Comparison of Eqs. 8.156b and 8.151a verifies the fact that the per- 
formance of conventional pre-emphasis FM receivers in weak noise is 
identical to that obtainable with maximum-likelihood reception. 


System comparison. It is interesting to contrast the weak-noise per- 
formance of FM, PM, and ordinary FPM when the signals are simul- 
taneously normalized to the same transmitted energy per sample of m(t) 
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and the same transmission bandwidth. For convenience of reference we 
summarize the weak-noise performances of these systems. 

FPM. 


\Ayf2 cos 2 t t(/ 0 + mW 0 )t ; - < t < — 

s m ( t) = 4W m 4W m 


\0; elsewhere, 

n/w m Yx 0 w m 


(8.157a) ^ 




Sm(t) = Ay/2 cos 277 ^/ 0 < + W 1 J m(t ) dt 
37- _ 1 (W m Y X 0 


n a (0 = ± F s — • 
l\Wj ZE, 

Sm(t ) = Ay/2 cos 277[/ 0 r W 2 m(t)J, 


IttWJ 2 E 


(8.157b) 


(8.157c) 


For each system the signal power has been normalized to A 2 , so that the 
energy transmitted per sample is 

A 2 

E s = — ; (8.158) 

2 W m 

The question of bandwidth normalization remains. For all three 
signals the RF half-bandwidth W may be taken as the maximum in- 
stantaneous-frequency deviation, plus a guard band of 2W m cps. 

W=\Af I max + 2 W m , (8.159a) 

— fi ~~ fo- (8.159b) 

At any sampling instant t k , we have the following relations. 


FPM. 



FM. 

A/ ="vk mW ' 

(8.160a) 

PM. 

A/= W 1 m(t, c ). 

(8.160b) 


A/ = W 2 

(8.160c) 
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Equation 8.160a follows from Eq. 8.49b. It would be desirable to scale 
the parameters W 0 , W lt and W 2 in such a way that the maximum instan- 
taneous-frequency deviations are the same. Difficulty arises, however, 
because there is no general way of specifying the relationship between 

m{f) (max and m'(t)\ max- . . . , . 

This difficulty may be partly resolved by assuming that m(t) is the result 

of passing a white Gaussian process of spectral density utl> 0 / 2 > - co </< co, 
through an ideal filter WJJ), as shown in Fig. 8.56. Each of the instan- 
taneous-frequency deviations in Eqs. 8.160 is then a zero-mean Gaussian 
random variable and is, of course, unbounded. The residual difficulty 
entailed by the assumption that m{t) is Gaussian is that it is not meaning- 
ful to normalize the maximum value of the A/. 


White Gaussian 
process 


Spectra! density 
JlOo/2. all f. 


W m (f) 


.m(t) 


5 m(f) ' 


jH.o/2; -W m <(<W n 
(0; elsewhere 


Figure 8.56 An ideal bandlimited Gaussian message process. 

On the other hand, it is meaningful— although less satisfying— to 
normalize W„ W lt and JV 2 in such a way that the probability that A/ 
exceeds any stated value, say, W z , is the same for all three systems. Then 
Wr can be chosen so that the probability P{A/> W 2 \ is acceptably small, 
say comparable to the probability of anomaly, and the transmission 
half-bandwidth can be taken as 

JV = W z + 2W m . ( 8 - 161 ) 

The advantage of the assumption that m(t) is a Gaussian process is that 
it simplifies equating the probabilities P[A/ > Wj\. From Fig. 8.56 we 
have 

(8.162a) 

and 




( 2 m-nWj^ Q W m . 


Wm 

J-w m 2 

From Eqs. 8.160, the variances of A/ are therefore as follows : 
FPM. 


FM. 

PM. 


A 2 / = W^JtoWJ. 

A Tf = %W 2 2 (2irW m ) 2 (Jl 0 WJ. 


(8.162 b) 

(8.163a) 

(8.163b) 

(8.163c) 


i 




3 

a 


Clearly, the three P[A/ > W r \ are equal for any W z if we scale the param- 
eters W 0 , W 1} and W 2 so that 




(8.164) 

If we adopt W 1 

as the basic parameter, the normalization of Eq. 8.164 

implies that the mean-square noise values of Eqs. 8.157 are 

the following. 

FPM. 

-i-r nfw m Y x 0 

AwJ 2E S ' 

(8.165a) 

FM. 

W 3 \wj 2E S ' 

(8.165b) 

PM. 

3\WJ 2E S 

(8.165c) 


We see that PM and FM yield equal noise-output power, whereas 
ordinary FPM is approximately 6 db inferior. This difference in weak- 
noise performance reflects the fact that FM and PM are analogous to the 
FPM scheme of Eq. 8.121 in which m modulates both the RF phase and 
frequency. Given an adequate phase reference and maximum-likelihood 
reception, either phase-and-frequency or antipodal FPM would afford a 
weak-noise mean-square error essentially equal to that provided by FM 
or PM. Perhaps even more significant, however, is the fact that the 
availability of a phase reference would permit the use of quadrature 
multiplexed antipodal FPM (or PPM). With bandwidth normalization, 
the n\t) afforded by this technique in the absence of anomaly would be 
6 db superior to that afforded by FM or PM. 


Probability of Anomaly 


Although the mean-square noise produced by conventional FM re- 
ceivers in the absence of anomaly is equivalent to that afforded by anti- 
podal FPM, the equivalence does not extend to the probability of anomaly. 
We now show that conventional FM receivers yield a P|yfc] that is sub- 
stantially inferior. 

The mechanism leading to anomalies with conventional FM receivers is 
inherent in the behavior of the signal phase at the limiter-discriminator 
input. In the absence of modulation, we have used (Eq. 8.143) the 
approximation 


<j>(t) = tan 1 


~n s (t) 
A + nft) 


nft) 

A 






(6) No encirclement of origin 


trajectory 
of aft) 



Integral = 2 v 


(c) Encirclement of origin 

Figure 8.57 Phaser interpret ation of anomalies w ith conventional FM receivers: 

a(t) = V[A + n«(03 s + M »*<0- 

But this approximation is valid only during intervals over which both 

l„ mi and In (01 are small in relation to A, and even with weak thes « 

conditions will occasionally be violated. For exa*^ as 

chan£ e randomly with time there is a certain probability that the tip oi 

the phasor a(0 of Fig. 8.57a will swing close to the origin, as sho 

Fl Assorts' as the. resultant phasor does not encircle the origin, the effect 
of such a°swing on the receiver is not particularly disastrous: for instance 
if we assume that a (?) moves with uniform velocity along the trajectory of 
Fil s m the Accompanying plot of dfrdt shows a small positive value 
during averse, with a sharp negative spike in the 
centef. The net change in 0 is zero, which implies that most of the energy 
content of d<f>jdt is in high frequencies that are eliminate y t e ou pu 

filter 
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On the other hand, if the tip of a(f) does encircle the origin, as shown in 
Fig. 8.57c, the effect is very different. The limiter-discriminator output, 
d<f>jdt, now exhibits a sharp positive pulse and the net change in <f> is 2tt 
rather than zero. Since n c (t ) and n s (t) are lowpass noise processes with 
bandwidth W = W 1 + 2 W m , to the narrow-bandwidth filter W m (f) such a 
pulse typically will look like an impulse of value 27r. It follows that there is 
now substantial low-frequency energy in the disturbance, which effectively 
overrides any substantial dependence of the receiver output on the trans- 
mitted signal. Thus m{t) is “disconnected” from m(t), and we say that an 
anomaly has occurred. | 

Following arguments that are due to Rice, 70 we can estimate the proba- 
bility, again denoted P[jfc], that at least one origin encirclement occurs 
during the interval [0, 1/2WJ. As shown in Fig. 8.57c, a necessary con- 
dition for origin encirclement is that n e (t) must be less than —A during 
some included subinterval during which n s (t ) changes sign. Following such 
a crossing of the negative real axis, a(/) will proceed to encircle the origin 
during the interval [0, 1/2 W J with a probability that (conservatively) will 
be greater than It follows that P[A] may be estimated by dividing 
[0, 1/21TJ into a large number of small intervals of duration A « 1/2 W m 
(as shown in Fig. 8.52) and calculating the probability that a (f) will cross 
the negative real axis during at least one of these \((2W m A) subintervals. 

Now let us focus upon the particular subinterval [0, A] and consider 
the two events 

Si = W0 < -A during [0, A]} 

B 2 = WO crosses through zero during [0, A]}. 

The intersection B x n B 2 is just the event that a(0 does cross the negative 
real axis during [0, A]. Furthermore, n c (t) and n g (f) are statistically in- 
dependent when the modulating signal m(t) is identically zero, which 
implies that in this case 

Pfonjy-PMPIA]. (8.166) 

In the absence of modulation both n£t) and n s {t) are lowpass Gaussian 
processes with bandwidth W and mean power density JV’ 0 /2- Thus 

= ^7(0 = NJV. (8.167) 

If we choose A « 1/2 IK— which guarantees A«1/21T OT — the small 
variation of n c (t) with time over the interval [0, A] may be neglected. It 
follows immediately that 


(8.168a) 
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The determination of P[5 2 3 is somewhat more complicated. Obviously 
it is not reasonable to neglect the variation with time of n s (t ) over [0, A] 
when the question of interest is whether n s (t) will change from + to — (or 
conversely) over this interval. It is shown in Appendix 8C that 


Thus we have 


P[5i n B 2 \ 


(8.168b) 


(8.169) 


Since we have chosen A « 1/2 HT, it is apparent that the behavior of the 
phasor a(f) is statistically dependent from one interval to the next. But 
the probability of a union is bounded by the sum of the probabilities of its 
constituent events — whether or not they are statistically independent. 
Since n e (t) and n s ( t) are stationary, each constituent of the union has 
probability P[5 X n B,]. Neglecting the possibility that the axis may be 
crossed without the origin being encircled, we estimate Pj>t] for the 
unmodulated case by the bound 


'2W m A ‘ 
1 w n 

—p . — 


'yx<,wi 


1 >1/ < .-v4*/2JY > 0 W 

^4,^4=. (8.170) 

■n/3 W m 

When m(t) is a nonzero constant, the preceding analysis is affected in two 
ways. First, rift) and »,(/) are no longer statistically independent. Second, 
as we have seen in Fig. 8.49, the bandwidth of n s (t) increases, thereby 
causing the average number of zero crossings per unit time produced 
by n s (t) to increase. Thus P[S 2 ] increases somewhat when modulation is 
present. A corresponding small increase in P|>f] is observed experi- 
mentally, but Eq. 8.170 remains a good estimate of the probability of 
anomaly with conventional FM receivers under the usual operating 
conditions of P[Vfc]« I. 

Comparison with FPM. For purpose of comparison with FPM it is 
convenient to rewrite Eq. 8.170 in terms of the energy E s = A*\2W m 
transmitted during each modulation sampling interval. We then have 

pr t i< 1 e ; for conventional FM. (8.171) 

[ J vs WJ >(W 




for conventional FM. (8.171) 
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We have already observed (Eq. 8.165 and its sequel) that antipodal FPM 
affords a mean-square output noise level that is essentially equal to that 
of FM when both are assigned the same signal energy and are adjusted to 
utilize equal RF bandwidth. But from Eqs. 8.122c and 8.119 the proba- 
bility of anomaly for antipodal (or phase-and-frequency) FPM is 


Pl^t] 


L W_ _ V- e-BjjJi ' o 
l W m / V2t r(EjX o y 


(8.172) 


It is apparent from Eqs. 8.171 and 8.172 that antipodal FPM affords a 
very much smaller probability of anomaly than conventional FM re- 
ception when both systems are providing the same mean-square noise 
output in the absence of anomaly. 


FM feedback receivers . From the foregoing analysis it is evident that 
the major cause of the poor anomalous performance of conventional FM 
receivers is the large bandwidth required at the limiter-discriminator 
input. This requirement can be ameliorated by the use of a “frequency- 
compressive feedback” [FMFB] receiver. 26 A block diagram is illustrated 
in Fig. 8.58. 



(W<W X ) 

Figure 8.58 FMFB receiver. 


An analysis of the mean-square output noise afforded by idealized 
FMFB in the absence of anomaly is presented in Appendix 8D. In 
particular, for noise sufficiently weak that JF 0 (W 1 + 2W m ) « A 2 , it is 


shown that 


m(t ) = m(t) + n(t). 


(8.173a) 


with 


Uwjw_ 

3 \wj 2E, 


(8.173 b) 


Thus the weak-noise performances of FMFB and conventional FM 
(Eq. 8. 146) are identical. 
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The strong connection between FM and phase-and-frequency PPM 
leads ns to believe that the probability of anomaly-m addmon to the 
mean-square output noise in the absence of anomaly-of these two 
Tstems would be equivalent if maximum-likelihood recepfon were used 
If this conjecture isirue, the bandwidth-normalized phase-and-frequency 
(or antipodal) FPM result of Eq. 8.172, 


2— — 5 
. W „ 


-Bj/aJ'T o 


represents an approximate lower bound to the probability of anomaly 
afforded by FMFB receivers. 


•yf 8.4 CHANNEL CAPACITY 

We have seen that twisted modulation schemes can be used to decrease 
mean-square error at the cost of introducing a threshold phenomenon 
For maximum-likelihood FPM (or PPM) receivers, it was possible to us 
[he union bound to establish relatively simple quantitative measures o 
the trade-off between these two effects when the maximum energy . 
allotted to the communication of a single (continuous) ran ^hettans- 
held constant and the stretch factor 5 is increased ^ 
mission bandwidth. In particular, we have seen that there is a close 
relation between the probability of anomaly in the continuous-parameter 
FPM case and the probability of error in the discrete-parameter M-orthog 

^On 'the 'other hand, in Section 5.6 we found that arguments more 
senshiveTthanthe union bound can be used to derive results that are 
stronger over certain regions of operation. The union bound on P[S] 
with Af equally likely orthogonal signals each having energy E , 


e -E s /2J^o 

But Eq. 5.106 states also that 

P[8] < 2 • 2~ TmE * {R) , 

where T m denotes the total signaling interval and 

M = 2 TmR , 

o < R < ~ 


■,-T m E*(R) 


(8.174) 


(8.175a) 


(8.175b) 


E*(R) = 


(V c oo - a/*) 2 ; 


^ < R < C w . 
4 


(8.175c) 


channel capacity 667 

Here, C ai is the infinite-bandwidth Gaussian channel capacity, 

C„=ilog„e, (8.175d) 

JV’q 

and P s = EjT m is the signal power. Equations 8.174 and 8.175 are 
exponentially equivalent for R < C ro /4, but Eq. 8.175 is both stronger 
and exponentially tight for values of R larger than this. The reliability 
function E*(R) of Eq. 8.175c is replotted for convenience in Fig. 8.59. 

The identification of the probability of anomaly with the M-orthogonal 
probability of error permits us to use Eq. 8.175, as well as Eq. 8.174, in 



Figure 8.59 Reliability function for orthogonal signals in white Gaussian noise. 

evaluating FPM performance. We need only set M equal to the effective 
dimensionality of the FPM signaling set and define the parameter R by 

R = 7^1og 2j 0, or p = 2 TmR . (8.176a) 

'Em, 

In the notation of Eq. 8.95a /S = ATW 0 , in which W 0 is the maximum 
instantaneous frequency deviation when \m\ < 1. Since the total signaling 
interval is T m = 2T, we now have 

P = 2T m JV 0 . (8.176b) 

It follows from Eq. 8.95b, 

rr*P*2 E/ 



| 


1 

■ 
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that the weak noise performance of a maximum-likelihood receiver for 
ordinary FPM is governed by the parametric relations 

€ -J2E. }; 0<i?<-^log 2 e. (8.177) 

P[^3 < 2 • 2- TnSt(m J 

In principle R may be chosen anywhere within the allowable range and a 
desirable balance between 7* and P[rt] obtained. As a practical matter, 
however, the effective dimensionality of the signal space, p 2 , 

grows exponentially with R so that the largest applicable value of R may 
be constrained by the available channel bandwidth. 

Equation 8.177 implies that c* can be forced to decrease exponentially 
with increasing T m when the transmitter power is held constant. The 
magnitude of the attainable exponent, however, is limited by the fact that 
R must be less than C„ if P[A] is to be small. In the boundary case, 

R — C m> we would have 

"i = 12 JVj. (8.178) 

7T* 2 E s 

We see that channel capacity constrains the performance of continuous- 
parameter, as well as of discrete-parameter, communication systems. 

P We now present arguments in support of the fact that FPM is m one 
sense, an optimum modulation scheme. Specifically, we show that it is not 
possible to communicate a continuous random variable m with a mean- 
square error in the absence of anomaly that decays exponentially faster 
than e~ 2JS ’ IXo without simultaneously incurring a large probability ot 
anomaly. The model that we shall consider is one in which the transmitted 
signal sjf) occupies a finite number of dimensions. Thus we assume that 

sjt) = 2>/ m ) V*(0» (8.179) 

3=* 1 

where the {sAm)} are differentiable functions of m and j>,(0} is a set of N 
orthonormal functions. We shall also assume that the a priori proba- 
bility density p m is uniform over the interval [-1, 1] and zero elsewhere. 

The question that we seek to answer is how fast the mean-square error 
7 can be forced to zero as N is increased, when the maximum allowable 
transmitted energy per dimension, E„, is held constant and we require 
simultaneously that the probability of anomaly approach zero. 

When transmission is disturbed by additive white Gaussian noise, so 

that the received signal is 


r(f) = s m (j) + njf), 


(8.180a) 
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the relevant received signal can be represented by the ^-dimensional 
vector 

r = s OT + n. (8.180b) 

The vector components are the projection of s m (t ) and n w (j) on the ortho- 
normal functions The energy constraint implies that 


s m \t)dt<NE N . 


(8.181) 


Since N is to be a variable, it is convenient (as in the channel-capacity 
arguments of Section 5.5) to normalize the vector representation by the 
factor 1/ViV. We define 


a 1 

s,„ = — = s„ 


(8.182a) 


A 1 

n = — — n 


(8.182b) 


l ~J} V r a ” + “ 


With this convention, we have 
k (2 = 1 


\t) dl<E H , 


(8.182c) 


(8.183a) 


Ji-.J 

h N k 2 


(8.183b) 


The geometric picture of the resulting communication problem is 
illustrated in Fig. 8.60. For every N, the twisted locus s TO is constrained to 
lie on or within an 77-dimensional sphere of radius V E u . As N approaches 
infinity, the probability that the squared length of the noise vector |n| 2 
exceeds J\° 0 /2 by any amount A goes to zero. Just as in the discrete co- 
munication case (Eq. 5.69), it follows that the received vector r will lie on 
or within a sphere of radius 


'E N + JYV2 + A, 


(8.184) 


with probability one in the limit N~ *■ co. 

We have already observed in the discussion encompassing Eq. 8.82 that 
the maximum weak-noise suppression attainable with a signal trace of 
total length L occurs when \d$ m )dm\ = L)2 for all m. If we assume that this 
condition is met, the only remaining task is to determine how large L can 
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be made as N increases while still requiring the probability of anomaly to 

^Givenany' signal locus, the maximum-likehhmrd^receiving^strategy^s^ to 

Sul « “S a" implies a /urge (discontinuous) 



Figure 8.60 Normalized N - dimensional signal spheres. 


a ^rp n roblem 8 of optimum signal design is to coil a signal locus o«he 

SUCh 3 7 Instead /xpbcitly, « proceed as 

:rdSt"and y s= g ek to establish a bound on how good a norse 

performance can P ossl “^ fa^signal locus that is a straight line of 

5ss.?s rx* — " “ •" ~ ■ 
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circle of fixed radius centered on the locus. In other words, the cross 
section of the tube at every point s should be an ( N — l)-dimensional 
sphere. This follows from the fact that if this condition were not met it 
would be possible to move distant volume elements to locations that are 
closer to the signal locus, as shown in Fig. 8. 62a for the three-dimensional 
case. 



Figure 8.61 Anomalous error due to boundary crossing. (The dashed lines represent 
the boundaries.) 

Moreover, it is clear from Fig. 8.626 that bending the ^-dimensional 
right circular cylinder recommended by the preceding argument increases 
the probability of a noise vector n crossing a boundary. This follows from 
the fact that, given m — m 0 , n can be decomposed into two statistically 
independent components, a one-dimensional component n x in the direction 
of 

d§.m 

dm m=7Ko 

and an (N — l)-dimensional component n 2 that is perpendicular to n x . 
When the tube is bent, the tubular cross section erected perpendicular to 
n x at the point n x + s m is no longer spherically symmetric about that point, 
and the probability of n 2 escaping from the tube is therefore increased. 
We conclude that for a given length L and a constrained volume V a straight 
line that is the axis of an JV-dimensional right circular cylinder would be 
an optimum signal-space geometry. 

The rest is easy. The normalized perpendicular noise component n 2 
contains N — 1 statistically independent components, each with variance 
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Figure 8.62 Nonoptimum cylinders. 


( 1 / 7V)(J>Po/2). With probability one, the length of n 2 therefore approaches 
y/(NJ2)(N — 1 )/iV as iV approaches infinity. Thus the radius of our right 

circular cylinder must be at least if the P^b^bility of 

an anomaly is to be forced to zero. Also, the total volume at our d IS posal, 
from Eqs. 8.184 and 5D.5, approaches 

Since we cannot coil a signal locus of length L into the signal sphere in any 
way that uses the available volume more effectively than a right circular 
cylinder, it follows that L cannot be so long that the volume of the cylinder 
would exceed V. Thus, for large enough N, we must have 


LBy-il 


n - 1 aVV- v-1)/2 

, N 2/ 


<By 



(8.186) 
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We Have noted in Appendix 5D that the ratio of the constants B N and 
B v _! approaches V 2 tt/N when N is large. Moreover, 

i;m pL_p^ v - 

A'-coU - 1/iW V 


(8.188a) 


It follows that for large N, 


L< ran. /^ 02 a, °n , 

V N v 2 


in which 


C N = | log 2 ( 1 + 


bits/dimension 


(8.188b) 


is the Gaussian channel capacity per dimension previously encountered in 
Eq. 5.59. 

Equations 8.188 place an upper bound on the allowable length of the 
signal locus when we hold 2s N and JfJ 2 constant and increase N while 
simultaneously requiring the probability of anomaly to approach zero. 
The mean-square error in the absence of anomaly is approximated by 

3 E[| Sl | 2 ] . L 

' ““ ’ S_ 2’ 

For our scaled geometry the mean-squared length of the one-dimensional 
vector n x is 

. E rkl 2 i = L^. 


It follows that we must have 


€ 2 _ ^o/l 5, A 2" 2jVG n (8.189) 

iVL 2 /4 ne 

when N is large. 

Equation 8.189 can be rewritten in terms of the transmitted power per 
second, P s , and the signal’s bandwidth W and total duration T m by 
identifying 

N = 2TJV (8.190a) 

and 

P s = ^M=2 WEh. (8.190b) 

Ej n 

We then have 

^2 ^ A 2 - 2T ” c , 

7re 


(8.191a) 
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in which C is the Gaussian channel capacity in bits per second: 

Equation 8.191 is our desired result, and its interpretation is important. 
We have seen that twisted modulation schemes can suppress ‘ weak 
noise at the cost of introducing a threshold phenomenon. Equation 8.191 
places an asymptotic bound on how much noise can be considered weak. 

If we attempt to increase the length of the signal curve (as a function of T J 
so rapidly that e 2 decreases exponentially with an exponent more negative 
than —2T m C, the noise perforce becomes “strong” in the sense that 
threshold considerations become dominant and the probability of anomaly 
approaches unity. This occurs even when the available number of dimen- 
sions is unbounded. Since C is greatest when W becomes infinite and 
Cai = (PJN o) logs e, we see that e 2 cannot be made to approach zero 
exponentially faster than 

2 -WwT. (8.192) 

which is the same as the strongest exponent afforded by FPM.f 

8.5 PULSE-CODE MODULATION 

Thus far in this chapter we have been concerned with communicating 
a continuous random variable m by means of a waveform sjt), some 
attribute of which varies continuously with m. An alternative procedure 
called pulse-code modulation (PCM) involves passing m through a 
quantizer before modulation, as shown in Fig. 8.63, and then utilizing a 

discrete communication system. , x 

Let us denote the set of values at the quantizer output by 
/ = 0 1, U - 1. With PCM, the index i is communicated by 
transmitting ’one of a corresponding set of waveforms, WO}- 
receiver decides which waveform was actually transmitted and sets m 
equal to the center value of the quantizer interval corresponding thereto. 



Figure 8.63 PCM transmission. 


t The expression for 7 with FPM (Eq. 8.178) includes a factor^ ■JIE. 
finnenr in Ea 8 191. This discrepancy is attributable to the fact that the l J 
FPM L sligMly larger than the P[S] for fJ orthogonal signals. The discrepancy is not 

exponentially significant. 
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Conventional PCM 

In conventional PCM it is customary to choose the number of quan- 
tization intervals to be a power of 2 and to employ a binary antipodal 
signaling alphabet. The transmitted signal may then be written 

Si(0= | %9h(0; i = 0, 1, . . . , M — 1, (8.193a) 

3=1 

in which the {<p/0} are an appropriate set of orthonormal functions, K — 
log 2 M, and the vector 

Sf « 0a> *«.••■» s iK ) (8.193b) 

represents the number i written in binary form. For example, if we adopt 
the mapping (0 ->- — \f E H , 1 q- a/ £ N ) and let j — 1 correspond to the 


~“H 4 p— 



(b) 


Figure 8.64 Uniform quantization: (a) quantization interval; (b) a priori and condi- 
tional probability densities. 

most significant digit, then, with M = 32, 

S 20 = -\-\JE n , — 

Provided that the receiver determines the transmitted index i correctly, 
the error (m — m) is due solely to the effect of quantization. The mean- 
square quantization error is easily determined whenever the quantization 
grain is uniform and sufficiently fine that p m is essentially constant over 
each individual quantization interval. Letting A* denote the ith quan- 
tization interval in Fig. 8.64a and a t the interval’s midpoint, we have 

Ef(m — mf J m in A J = j* (a — a ^ 2 p m ( ocj A^) da.. (8.194a) 
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x; 


But, as shown in Fig. 8.646, the conditional density function of m, given 
that m falls in A* is essentially a rectangle of unit area and width A 
centered on a t . Thus 

1 CA/2 AS 

E[(m - mf I m in A,] « i f d? = - . (8.194b) 

1 A J- A/2 12 

Since the right-hand side is independent of i, the over-all mean-square 
quantization error is also given approximately by 



(8.195a) 


When m is constrained to [— 1, + 1], the number of quantization intervals 
is M — 2/ A. In terms of M, we have 



(8.195b) 


The symbol e 2 is used in Eqs. 8.195 because quantization error with 
PCM is analogous to the error in the absence of anomaly with twisted 
modulation. This follows from the fact that m is again effectively dis- 
connected from m whenever the receiver determines the index i incorrectly, 
an event that we define by analogy to be anomalous. For conventional 
PCM in additive white Gaussian noise the probability of anomaly is 
therefore identified (in accordance with Eq. 5.10) as 


V[A) = 1 - [1 - e(V2E N /JV’ 0 )] log2M 

< (log* M) ■. . (8.196) 

V 4ttE n /N 0 

Conventional PCM is particularly useful when the communication 
channel is a series of cable links connected in tandem; regenerative 
repeaters can then be installed at each node. As long as the product of 
the probability of error per bit on each link times the number of links is 
negligibly small, cumulative noise and distortion effects are effectively 
circumvented and e 2 represents the only significant degradation effect. 


PCM with Error Correction 

The specific relationships among e 2 , P|Vfc], M, and stated in 

Eqs. 8.195 and 8.196 reflect the nature of the particular (conventional) 
transmission scheme to which they apply. These relationships, however, 
are in no sense fundamental attributes of PCM. Indeed, a most in- 
triguing aspect of PCM is that it permits us to break the constraint that 
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causes P[^t] and e 2 to be so tightly interrelated in FPM and equivalent 
modulation systems. 

The essence of this constraint is that an FPM signal depends at any time 
instant on only a single sample of the modulator input process. The 
import is that the same time parameter, T m , enters into both of the para- 
metric relations 

^ _ 12 JV\, y-iT m Ry 

v 2 2E s (8.197) 

P[>fc] < 2 • 2~ TmB * (R) 

[c.f. Eqs. 8.177]. 

In contradistinction, quantization of the modulator input frees us from 
this constraint and reduces the PCM communication problem to one in 
which the techniques of coding can be exploited. Let us assume that the 
message process m{t) has bandwidth W m , so that, in the notation of Eq. 
8.49a, 


m(l) = 2 m k ip k (t). 


(8.198) 


If each sample of m{t) is quantized into one of M different levels, in time 
T there will be M TlTm different possible messages, in which we have iden- 
tified T m as the sampling interval, 

T m = ~r- (8.199) 

In accordance with Eq. 5.2, the communication rate when each message is 
equally likely is then 

R = — log 2 (M r/rm ) 


= — log 2 M bits/sec, (8.200a) 

whence 

M = 2 TmR . (8.200b) 

In coding, the mean-square error (when the decoder output is correct) is 
again due solely to the effect of quantization. In terms of the parameter R, 
it follows from Eqs. 8.195b and 8.200b that 

7 2 = l2~ 2TmR . (8.201a) 

On the other hand, the probability of a decoding error (anomaly) depends 
not on T m but on the code-constraint length, say T. Equation 5.106 implies 
that the probability of anomaly with block orthogonal coding is 

P[^t] < 2-2~ teX{r) . (8.201b) 


678 WAVEFORM COMMUNICATION 

Here E*(R) is the same reliability function, plotted in Fig. 8.59, that enters 
into Eq. 8.197. 

The crucial distinction between the parametric relations of Eqs. 8.197 
for FPM and Eqs. 8.201 for PCM lies in our ability to choose T » T m . 
Thus P[it] can be made arbitrarily small for any R < C ra = (P s jX 0 )\og 2 e. 
We observe, however, that the constraint on R again implies that e 2 can 
be forced to zero no faster than e~ zE ^\ where E s = P s T m is the energy 
transmitted per sample of m(t). 

The difficulty of unbounded bandwidth implied by block orthogonal 
coding can be avoided by using convolutional coding and sequential 
decoding, or any other suitable coding-decoding technique. With 
sequential decoding, the constraint on R becomes 

R < R 0 , (8.202a) 

where R 0 is the two-message error parameter plotted in Fig. 6.21. For 
sufficiently large (but finite) channel bandwidths the corresponding con- 
straint on e 2 is 

? ^ ^ e -s^\\ (8.202b) 

We conclude that a continuous source may be converted to an equivalent 
discrete source and communicated over a digital system without sacrifice 
in performance potential. For many systems the availability of digital 
techniques for achieving operation close to the theoretical limits will make 
this conversion attractive. Moreover, conversion to a standard digital 
form serves to focus attention on the problem of reducing the rate ol the 
equivalent digital source without impairing fidelity (for example, by 
speech and television bandwidth compression). Such matters will be the 
subject of research for many years to come. 


APPENDIX 8A THE SAMPLING THEOREM 


One statement of the sampling theorem is that any waveform, x(t), 
whose Fourier transform X(f) exists and is identically zero outside the 
range —W<f< W, can be represented by the equation 


x(t)= X x im) 


(8A.1) 


Here the “interpolation function” 

. . a sin 2vWt 

v(t) = 

27TWt 


(8A.2a) 
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is the impulse response of an ideal rectangular filter with transfer function 
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V(f) 



\ 
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I 
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• 
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w 


(c) 

Figure 8A.2 Convolution of X(f) with U(f), followed by multiplication by V(f), 
regains X(f). 


Since the integration in Eq. 8A.3a yields the right-hand side of Eq. 8A.4a, 
we treat U(f) and u(t) as a Fourier transform pair. 

Next, we observe from Fig. 8A.2 that X(f) is regained if we first con- 
volve X(f) with U(f) and then multiply by V(f): 

X{f) = [X(f) * UUWU)- ( 8A - 5 ) 

Since convolution in time corresponds to multiplication in frequency and 
conversely, taking the inverse Fourier transform of both sides of Eq. 8 A. 5 
yields 

x(t) = [*(0 u(0] * »(0 



By interchanging the order of integration and summation, we have 


<t)= I *( a )6la---lo(f-a)rf« 

oo J— os \ AW/ 


= 2 


which completes the proof. 

In terms of the unit-energy interpolation function 


(8 A. 6) 


# = V2H >5 ||f'. ( 8A - 7a > 

Eq. 8A.1 is written 

(8A - 7b) 



(8A.7c) 


We recall from Section 8.1 that the {yj(t - kfiW)} are orthonormal. 

Discussion. The condition that A '(f) vanish outside [— W, W] is 
critical in the proof of the sampling theorem. Otherwise, convolution of 
X(f) and (lj2W)U(f) causes spectral overlap from one frequency band of 
width 2 IV to the next, as indicated in Fig. 8A.3. The resulting distortion 
is called “aliasing.” 


f 
(a) 
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Since ideally bandlimited waveforms are physically unrealizable, 
sampling, in principle, always introduces a certain amount of aliasing. As 
a practical matter, however, this distortion becomes negligible when ®(0 
is effectively bandlimited and the sampling period is selected judiciously. 
Consider, for example, the waveform x{t) and spectrum \X{f)\ shown in 
Fig. 8A.4. If we define 



(b) 


Figure 8A.4 A waveform that is effectively limited both in duration and bandwith. 
then from Fig. 8A.3 and Parseval’s theorem we have 

J" [x(t) - y(t)f dt = £V(/) - y (/)l 2 d f 

= 2 fV(/>i 2 #+ f l i 7 x (/~ 2kw ) # 

Jj V J-W 

(8A.8B) 

in which the primed sum does not include k = 0. If X(f ) is sufficiently 
well-behaved, we can find a constant B such that 

V X(f-2kW) < Bmax \X(f- 2kW)\ ; all|/|<IF. 
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Thus y(t) is a good approximation to x(t) whenever the sampling interval 
1/2 IF is chosen small enough that 

C°° C 00 C 00 » 

(2 + B 2 ) |X(/)( 2 df « \X(f)\*df=\ x\t) dt. (8A.9) 

Jw J— oo J—co 

A further approximation to x(t), say z(t), results if we discard all terms 
in Eq. 8A.8a for which \k\ is greater than some integer K/2: 

*(0=2 (8A - 10) 

k^-K/2 \ 2 kk / 

By virtue of the orthonormality of the {ip(t — kj2W)}, we then have 

/’oo — (l-KZr/2) CO 

[y(t)-z(t)fdt=: 2 I xf. (SA.lla) 

J-co fc OO fc=l+(IC/2) 

Thus z{t) is a good (K + l)-dimensional approximation to y(t), hence to 
the original waveform x(t), as long as K is chosen to be sufficiently great 
that the sums on the right-hand side of Eq. 8A.11 are also much less than 
the energy of x(t): 

f" [2/(0 - *(«)]* dt « f * **(0 dt. (8 A. lib) 

For any lowpass waveform x(t) encountered in engineering practice, 
minimum values can be chosen for K and W such that Eqs. 8A.9 and 
8A.llb are both satisfied. In common parlance, x(t) is then said to have 
effective bandwidth W and effective duration T = Kj2W. Sampling 
techniques are useful in obtaining a finite-dimensional approximation to 
such a waveform. For precise relations governing the minimum dimen- 
sionality of classes of time- and bandwidth-limited waveforms refer to 
Appendix 5A. 


APPENDIX 8B OPTIMUM MEAN-SQUARE LINEAR FILTERING 


The problem of optimum (i.e., minimum mean-square error) receiver 
design when linearly modulated signals are corrupted by additive white 
Gaussian noise njf) was considered from a sampling theorem point of 
view in Section 8.1. In particular, when m(t ) is a stationary Gaussian 
process with mean power density function 


4*; -w m <f<K, 


K(f)= 2 

,0; elsewhere. 
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we found (Eq. 8.63) that the optimum receiver for 

KO = A m{t) + njf) 

consists of an ideal linear filter with transfer function 

f UA • I f | c W 

Hin^U+X^’ 

1,0; elsewhere. 

Here X 0 j2 is the mean power density of njt% and 

P 4 — m\t) 


(8B.3a) 


(8B.3b) 


is the mean energy transmitted during a sampling interval. 

We now use the techniques of optimum mean-square linear filter 
theory 14 - 55 to obtain Eq. 8B.3 in a different way. Although the fact that 
the optimum receiver in this case is indeed a linear filter cannot be estab- 
lished by these techniques, the theory does show that the transfer function 
of Eq. 8B.3 is best within the class of all linear receivers. In addition, the 
theory may be used to investigate the performance degradation that 
ensues when H(J ) is required to be physically realizable; that is, when we 
impose the additional constraint that the impulse response of H{f) must 
satisfy 

hit) = 0; all t < 0. (8B.4) 

It is convenient to formulate the minimum mean-square error linear 
filtering problem in a quite general way. Let z(z) be the desired signal and 
let x{t) be the input to a linear filter with impulse response hit). We wish 
to design h{i) in such a way that the filter output, say z(z), will minimize 

the mean-square error 

Afj 4 [ Z (f) - .(I)] 2 . (8B.5) 

The constraint of Eq. 8B.4 is introduced by writing 


z(t) = I x(t — a) h{ a) da, 


(8B.6) 


in which the domain of integration / is taken to be [0, co] if the filter 
must be physically realizable. Otherwise, we take I to be [— go, co]. 

In determining the best linear filter, we need not require that x(t) and 
z(z) be Gaussian processes. We do assume that both processes are wide- 
sense stationary, with known correlation functions ft 3; (r) and 3l 2 (r), and 
that the crosscorrelation function of z and x is wide-sense stationary and 
known : 

<040 4 KJt - S). (8B.7) 
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As an example, in the communication problem of Fig. 8B.1 the desired 
output might be a delayed replica of m(t ) : 

<0 = mil — T). (8B.8) 

Then 

ft zx(t) = <0 a it - t) 

= mit — T)[A mit — r) + n w (t — r)] 

= A mit — T ) mit — r) 

= A SUt - T), (8B.9) 


in which we have used the fact that n jf) is a zero-mean noise process that 
is independent of mit). Negative values of T correspond to prediction 
of mit). 



Sm(f) = \W m (f)\ 2 ^o!2 


Gain A 


Figure 8B.1 A communication problem in which optimum mean-square linear filter 
theory may be used to design the receiving filter h{t). 


We first prove that h is optimum if and only if the resulting error, 
<r(z) = z(/) — z(z), is uncorrelated with the filter input z(z) for all time 
displacements within the domain /; that is, if and only if 

e(z) x(z — t ) = 0; for all t in /. (8B.10) 

Proof is immediate. Let h be the filter that satisfies Eq. 8B.10 and let g be 
any other linear filter. We use z(z) to denote the output of h and z(z) to 
denote the output of g. Then 

m - m? = no m + m - m? 

= + m - kw + m) - mm - m- 

But [z(z) — z(?)] = e(z), and e(/) satisfies Eq. 8B.10. Thus 


o(o - mm - m - - tiom - gwj dr 


i 


(0 *(t - t )[*( t ) - g(r)l dr = 0. 
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It follows .that 

W) - 2(0? = W) + W) - 5 (0T 2 > 

which is clearly minimum when 

2(0 = *(f). 

Equation 8B.10 is called the Wiener-Hopf condition. The remaining 
problem is to solve this equation for the function h that satisfies the 
condition. The first step is to rewrite Eq. 8B.10 in the form 

j^O^lCOT W^) = o; for aU T in 7 > 

which yields 

z{t) x{t - r) = 2(0 x(t - t) 

= J x(t — t) x(t — a) h(a) da, 
l 

or 

% z ff) = J fll^a - r) h(a) da; for all r in /. (8B.11) 

i 

Unrealizable Filters 

Solving Eq. 8B.11 for the optimum filter is simple when / = [-co, oo], 
that is, when h is permitted to be unrealizable. Taking the Fourier trans- 
form of both sides, we then have 

in which SJJ), the transform of 31 Jr), is the cross power density function 
of 2(0 and »(0- Thus the transfer function of the optimum linear filter is 

rj(f\ _§£*(/); f or / = [— co, co]. (8B.12) 

SJtf) 

Equation 8B.12 is a general result applicable to any case for which S zx (f) 
and SJf) are known. 

It is easy to show that Eq. 8B.12 reduces to Eq. 8B.3 m the special case 
in which we identify x(t) with the received signal in Eq. 8B.2 and 2(0 with 
m(t). We then have 


31 Jf) = m(0 [A m{t - r) + njt - r)] 
= A 


(8B.13a) 
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A simple — although at first glance artificial — application occurs when 
S m (f) is given by Eq. 8B.1 and 

x(t) = A m{t) + n(t), (8B.18a) 

where n(t ) is stationary and 


Uf) = 


y ; 1/1 < w m 

jNP ° ; I/I > w m . 


(8B.18b) 


Assuming that n{t) and m(t) are statistically independent, we then have 

s,(/) = ^ K(f) + ».(/) 


= ^0 -+ d i !^o ; for all /. 
2 


(8B.19) 


If we further assume that the desired receiver output is a delayed replica 


2Wm/A_ 

(l + X 0 /2E m ) 



Figure 8B.2 Delayed impulse response of an ideal lowpass filter; the optimum realiz- 
able approximation is obtained by deleting the (dashed) section to the left of / = 0. 

of m{t), as in Eq, 8B.8, then 3i z Jr) is given by Eq. 8B.9 even though n(t) 
is not white. Identifying X 0 with N 0 + A 2 X 0 , we have 


— 3UI-T); t>0 


A(0= {-^o + (8B.20a) 

(0, elsewhere. 

Since % m (r) — JL Q W m (sin2^W m r)l27TW m r and E m = A 2 M> 0 j2, Eq. 
8B.20a can also be written as 


( 

h(i) — |l 4- N 0 j2E m 
VO; t < 0. 


sin 2rrW m (t — T) , 
27 rWJt-T) ’ 


(8B.20b) 
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As shown in Fig. 8B.2, as T becomes large h{t) approaches the (delayed) 
impulse response of a rectangular lowpass filter with gain 
M(1 + l2E m )]-\ 

which again agrees with Eq. 8B.3. The agreement reflects the fact that the 
noise outside the band [— W m , W m ] is irrelevant in the limit T ->■ go. The 
delay T is introduced by the requirement for physical realizability. If the 
value of T is negative, the output from the filter of Eq. 8B.20 is the opti- 
mum mean-square linear prediction of the value that m(t ) will assume T 
seconds later. 

Mean-Square Error 

The mean-square error for the optimum linear filter of Eq. 8B.17 is 
obtained as follows: we first write 

W) = - m = <0 2 ( 0 . 

in which the second equality follows from the Wiener-Hopf condition: 


e(t) z(0 = «(t) J*(* — t) h(r) dr 

I 

= Je(0 x( A ~~ T ) M T ) dr — 0. 




= z\t) — z(f) x(t — r) h(r) dr 


= &*(0) - &zx(t) h(r) dr. 


Invoking Eq. 8B.17, we have 


(8B.21a) 


e 2 (0 = 3t z (0) - — 31^ 2 (t) dr; for x(t) white. (8B.21b) 

As an example, for the case leading to Eq. 8B.20b we have 
I - [0, oo] 

31/0) = m(t - Tf = m\t) = Ai> 0 W m , 

X 0 = ■N’o + 

« , , < m ^ 4 u sin 2 t rWjr - T) 


3i zx (r) = A 3i m (r — T) — AJL Q W n 


27 rWJr-T) 
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Thus 

7(7) = - (1 + ^o/2£J -1 J* o 2W m \ 

(8B.22) 

The integral in Eq. BB.22 is plotted in Fig. 8B.3 as a function of the 
normalized delay 2 W m T. As T becomes large, the integral approaches 
unity. Thus the signal-to-noise ratio, 


sin 2? tWJj — T) 8 ^1 

■ 2tt1T ot (t — T) J J 


S _A m 2 (Q tjjhplf'm 

” 7(7) ~~ 7(7) ’ 


(8B.23a) 


approaches the value obtained with the unrealizable filter of Eq. 8B.15: 



1 + 2 E m IJP 0 


as T-* co. 


(8B.23b) 


We observe that 2 W m T need not be greater than unity in order to obtain 



an S/J'P that is very nearly equal to this limiting value The utility of the 
somewhat artificial noise power density function specified m Eq. 8B.18b 
lies in the fact that Fig. 8B.3 implies a lower bound on the value of 
S IN attainable with realizable filters when S JJ) - J^o/2 for all/, clear y, 
increasing S n (f) outside [- fV mr WJ can only reduce SJJP. On the other 
hand, the S/JV" when the noise is white cannot exceed the limiting value - 
of Eq. 8B.23b. Thus we have tight upper and lower bounds on b/JV tor 
white additive noise and lj2W m . 
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The residual error when T -> co is called “irreducible.” Setting 
I = [—co, co] in Eq. 8B.21a, and introducing h(i) from Eq. 8B.12, we 
observe that the irreducible error is equal to 

/*co /*co 

e f rr = 3t z (0) - )dr\ H(f)e' 2nfr df 

J — co J — c o 

= % z (o) - r Hinson df 



J _ |S Z ,(/)1 2 ~ 
S XO Uf) j 


df. 


(8B.24) 


Equation 8B.24 is valid whether or not the process x(t) is white. 


Nonwhite x(t) 

When x(t) is nonwhite and I = [0, co], a reversible whitening filter 
(cf. Appendix 7 A) may be used to whiten x(t) as a first step in optimum 
linear filtering without degradation of the attainable performance. Let 
x w (t) denote the output of the whitening filter. As shown in Fig. 8B.4, we 



Figure 8B.4 Prewhitening. 


may then complete the optimum filtering job by concatenating the optimum 
filter for estimating z(t) from x w (t). This filter is again specified by Eq. 8B. 17, 
with substituted for 3l zx (r). The same substitution enables us to 

use Eq. 8B,21b to evaluate the resulting mean-square error. 


APPENDIX 8C DETERMINATION OF THE PROBABILITY THAT 
n s (t ) CROSSES ZERO DURING A SHORT INTERVAL 


Let x denote n s (t) | (=0 and let y denote dn s (t)i<dt | <=0 , Now, as shown 
in Fig. 8C.1, n s (t) passes through zero from — to 4- during the interval 
[0, A], where A « 1/2 W, if and only if x < 0 and x + ?/A > 0. Thus the 
probability of a (— to +) zero-crossing over [0, A], say P, is 

P = f “dp f° p XtV (a, P) do.. (8C.1) 

Jo J-p& 

I 
i 
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n s (t) 



Figure 8C.1 Conditions for positive-going zero-crossing. Since «,(/) has bandwidth W, 
neither «,(/) nor n/(t) can change appreciably over the interval [0, AJ. 


Since dn s (t)!dt = n s '(t ) is obtainable by passing n s (t) through a filter 


with transfer function 


H'{f) = \2irf 

(8C.2) 

both x and y are Gaussian random variables. To determine 
therefore only necessary to calculate x 2 , y-, and xy. Since 

Px,y , ^ is 

f— °, for |/i < W 

«-(/)- 2 

1,0, elsewhere, 

we have immediately 

(8C.3) 

?-r 

J—o 0 

(8C.4) 

and 


7 = ( 2 7r) 2 r r S J/) df= H2 irWftJC.W). 

(8C.5) 

*/ — 00 

Thus 


4 = iCi’Wf, 

(8C.6) 


a result that we shall invoke later. [In the general (nonrectangular) case 
we would have 


-5 (27t) 2 f 00 / 2 §„,(/) df 

y j—co 

* Z co 

J — co 


(8C.7) 


provided that the integral in the numerator converges.] 
In order to determine xy we first note that 
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Letting y — /S 2 /2t/ 2 yields 


= y*e~ y dy 

Wa/y Jo 


= ~ VV/* 2 . < 8C - 13) 

2,77 

By symmetry, the probability of a zero crossing from -j- to — is also 
equal to P. Thus the total probability that n s (t) will cross zero in any 
direction in a small time interval [0, A] is approximately 



A 2itW 

77 y/3 


2WA 

V3 ' 


(8C.14) 


In the case of arbitrary (nonrectangular) S.„ s (/), 2 P can be evaluated by 
substituting Eq. 8C.7 in Eq. 8C.13, 


APPENDIX 8D WEAK-NOISE PERFORMANCE OF FM 
FEEDBACK RECEIVERS 

Consider the idealized FMFB receiver diagrammed in Fig. 8D.1. We 
assume that |m(OI < 1 and 

s m (0 — Ayj 2 cos 2 tt |/ 0 < + W 1 J m(t) dlj , (8D. la) 

5(0 = -J2 cos 2 tt f x t + m(t) dt j ; W < W x .. (8D.lb) 

Our first task is to determine the setting of the attenuator G which leads 
to perfect reception in the absence of noise. The conditions are that the 



Figure 8D.I Idealized FMFB receiver. 
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modulating process m{t) has ideal bandwidth W m ; the RF half-bandwidth 
of 5 to (0 and #].(/) is W = W 1 + 2 W m ; and the center frequency of HJf) 
is/ 2 — /o ~/i- 

When nj(t) = 0, the difference-frequency (DF) component of the IF 
filter input is 

r a(0| df = A cos [2 tt/ 2 (0 + 0(0 - 0(0] (8D.2a) 


in which 


0(0 = 2-nW 1 jm(t) dt, 
0(0 == j m(t) dt. 


(8D.2b) 


(8D.2c) 


It is convenient to assume initially that the bandwidth of H 2 (f) is large 
enough that 

r a(0 = r a(0| DF- 

By convention, the output of the limiter-discriminator is then 

>4 ( 0 = | [0(0 - 0(O ] = no - 0'(O. 

Since m[t) and m(t) both have bandwidth W m , 

m(t) = G[Q'{t) - 0'(/)] * w m (t) 

- G[Q\t) - 6\t)] 

= 2ttG[W 1 m(t) — W m{t)]. 

Thus 

m(0[l + 2 t7 WG] = 2ttGW x m(t). 

It follows that m(t) — m(t ) when 


2tt (W x - W) 


(8D.3) 


We next determine the minimum allowable bandwidth of When 

m(t) = m(t), we have 


0(0 - 0(0 = 2tt(W 1 - fr)Jm(t) dt. 


Thus the difference-frequency component at the input to the IF filter is an 
FM signal whose maximum instantaneous frequency deviation is 


A/max = 277(110 - W) m(0max- 
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The objective of FMFB is to reduce the probability of anomaly by re- 
ducing the IF bandwidth. We therefore assume that the feedback is 
sufficiently strong — that is, W is sufficiently large — that A/ max « W m . In 
this case ^(OIdf is a narrow-band FM signal which we may assume 
passes through HJJ) with negligible distortion as long as the filter half- 
bandwidth is at least W m cps. Accordingly, we take 


HAS) = 


h - w m <\f\<fz + W„ 
elsewhere. 


(8D.4) 


(8D.6a) 


(8D.6b) 

(8D.6c) 


The final step in our analysis is to investigate the effects of weak noise 
when G and H 2 (f) are given by Eqs. 8D.3 and 8D.4. We consider the 
case in which there is no modulation, that is, m(t ) = 0. We then have 

r x (/) = A cos co 0 t + bandpass noise 

= [A + n 0 {t)] V2 cos a) 0 t -I- n s (f)V 2 sin co 0 t, (8D.5) 

where n c {t ) and n s (t) are statistically independent lowpass Gaussian pro- 
cesses, each with power density spectrum equal to over {—W, W]. 
Introducing the polar transformation of Fig. 8.50 yields 

r t (t) = a{t)sl 2 cos |>/ 0 + (8D.6a) 

in which 

cj>(t) = tan -1 > • ( 8D - 6b > 

y A + n c (t ) 

a \t) 4 [A + n c (01 2 + »/(0- ( 8D - 6c > 

Just as in the noiseless case, it is convenient to assume initially that the 
bandwidth of H 2 (f ) is large enough to pass 

^(Ol df = «(0[cos 2^ rf 2 t 4- <Kt) — m (8D.7) 

without distortion. This definitely does not occur when H 2 (f) is given by 
Eq. 8D.4, but the effect of narrowing the IF half-bandwidth to W m cps is 
most easily determined by first establishing the nature of m(t) when 
H 2 {f ) is broadband. 

When H 2 {f) is so wide that 

i" 3(0 = r 2(0|r>F> 

we have 

rit) = f (o - m 

m(t) = G[f (0 - 0'(O3 * ™ m (t) 

= G[f(0 - 2irlfrm(i)]* wjt). 


we have 


and 
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But m(t) ■■ w m (f) — m(t), so that the equation above becomes 
+ 2 -nGW) = G[<f>'(f) * w m (t)l 
Substitut: ug G — [277(14^ — $0] -1 yields 

*(0 = l<kX 0 * W m (0]. 

2ttW 1 

We nov introduce the weak-noise assumption 

Jr 0 W«A*. 

It follow:, that, with high probability, 

A 

In the absence of anomaly we therefore have 

^0 ~ i<(0 * = n(t). 

Integrating the power density function of n{t) yields 


(8D.8) 


(8D.9) 


(8D.10) 




’ A 2 


(8D.lla) 


or, in terms of the energy E s = A 2 /2W m transmitted during a sampling 
interval of m(t), 


3 \WJ 2 E s 


(8D.llb) 


We see that the assumption that H 2 (f ) is broadband leads to the same 
weak-noise suppression provided by conventional FM (Eq. 8.146). 

The remaining task is to investigate the weak-noise behavior when 
#200 is given by Eq. 8D.4. Only the part of n s \t) that passes through 
W m {f) affects m(t). But with weak noise the effect of narrowing H 2 (f) is 
simply to eliminate those spectral components of «,'(/) that W m {f) would 
discard anyway. It follows that n\t ) is unchanged from the value given by 
Eq. 8D.11 even when H 2 {f) is narrow-band. 

Just as with conventional FM receivers, the foregoing analysis can be 
extended to the quasi-static situation in which m(t) is a slowly varying 
function of time. The value of n\t) remains the same as long as the 
probability of anomaly is negligible. 
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It appears from a first perusal of the weak-noise behavior in the absence 
of modulation that only the noise in a bandwidth of ± W m cps around 
the signal contributes to the probability of anomaly. On the other hand, 
the requirement that the receiver be able to respond to all possible 
amplitude-bounded input messages m[t) of bandwidth W m , including 
those that cause s m (t) to sweep over the full bandwidth / 0 ± W during a 
single sampling interval, implies that the full input bandwidth must enter 
into the determination of m{t) when the noise is not weak. We feel that 
this requirement prevents P[jt] from being smaller than that afforded by 
FPM with comparable stretch. As mentioned previously, however, we 
have not been able to construct convincing mathematical arguments that 
this is so for idealized FMFB receivers with rectangular filters. The prac- 
tical case where the filters within the closed loop have broad nonzero 
skirts has been considered on a partly empirical basis and with a somewhat 
different definition of anomaly in the literature. 26 

PROBLEMS 

8.1 We wish to communicate a random variable m with probability density 

/>„(«) =-T= «-<-*>■'■ +e-c+‘>’ ,! 

2V2 ir L - 

over an additive white Gaussian noise channel by means of the linearly modulated 

si S nal 

s m (t) => mA y(0; <p\t)dt = 1. 

a. Estimate the minimum attainable mean-square error when J'T 0 /(2/4 2 ) « 4. 

b. Sketch p m i a \ r -■= p) for several typical values of p when Jf 0 /(2 A 2 ) = 16. 
Discuss the validity of mean-square error as a performance criterion in this case; 
suggest and discuss alternative criteria. 

8.2 Consider a communication channel with r - mA + n, in which r, m, and 
71 are random voltages and 

/>*(«) Pn(P) *= Yb 6 ^' 

Assume m and n are statistically independent. 

a. Determine as a function of Ajb the mean-square error of a maximum 
liklihood receiver; a maximum a posteriori probability density receiver; and 
a receiver which ignores the channel output and sets m — 0. 

b. What is the minimum mean-square error decision rule when the maximum- 

likelihood rule is indeterminate? 
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8.3 Consider communicating two correlated Gaussian random variables, m = 
(m l5 w 2 ), with 

, , 1 T a, 2 — 2pa 1 a ! > + a 9 2 l 

/>"*(«) = - , «P - J • 


2Wl _ p 2 


2(1 - p 2 ) 


over an additive white Gaussian noise channel by means of two different systems. 
System A transmits 

Sm(t) - m t A 9i(0 + rn 2 A <p 2 {t) 

where pff/) and q> 2 (t) are orthonormal, and estimates each m { independently by 
means of two uncoupled one-dimensional minimum mean-square error receivers. 

System B transforms m into two uncorrelated random variables, say in = 
(m x , m 2 ), and transmits 

Sm(0 = ft\A + m 2 A 9 2 (/). 

The receiver of system B determines m by first making a minimum mean-square 
error estimate of the vector m and then applying thereto the transformation 
inverse to that of the transmitter. 

Evaluate and compare the total mean-square error produced by each system 
as a function of the total mean energy transmitted. 

8.4 Consider the waveform 


s(t) = 2 s t ’M'X 

3 = 1 


Wjil) = w\t 


in which 


a. Prove that, for any r, we also have 


, . A sin 2nWt 

y>(t) = 

lirWt 


b. What assignment of the fa) approximately maximizes ^(0) when r = 1/4 IT 
and we_ require |^| < 1 for all j7 

8.5 Consider the random process 

A K >- 

n(t) = lim ^ tij y>j(t), 

K-”X> j= —Kl'2 

in which the {«,} are statistically independent, zero-mean Gaussian random 
variables and the {?,(0} are as defined in Problem 8.4. Prove that nit) is a sta- 
tionary Gaussian process with power density function 


w-g 


(Jf 0 /2; I/I < W y 

\0; elsewhere. 


as claimed in Eq. 8.50. 
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8.6 The envelope detector illustrated in Fig. P8.6 is to be used to demodulate 
the DSB (voltage) signal 

s(t) = A[ 1 + cos lirWt] cos 2 t r/ 0 t; f 0 » W. 

a. Sketch the waveshape of the current through the diode. 

b. Determine the largest permissible value of the time-constant, RC. 

c Consider the detector output when RC is adjusted in accord with (b), and 
estimate the ratio of the Fourier components at the frequencies W and / 0 as a 
function of/ 0 /fF. 



Figure P8.6 


8.7 Appendix 8B contains a proof that the least mean-square linear filter pio- 
duces an output error, <(/), that is uncorrelated with the input. This does not 
imply that some nonlinear operation might not afford a smaller value of e ; in 
general it will. When the input x{t) is a zero-mean Gaussian process, however 
linear filtering is best. Prove that this is true. Hint. Let x represent any set of 
observations of x{t), let the desired output be z(/), and let/(x) denote an arbitrary 
nonlinear estimate of z(». Expand [z(r) - f(x)f in terms of the optimum linear 
estimate z(r), as in the proof of Eq. 8B.10, and show that the crossterms vamsh. 

8.8 Let x, y, and z be three zero-mean random variables with known variances 
and covariances. 

a. Specify the constants a and b for which 

z = ax + by 

is the least mean-square error linear estimate of z, based on x and y. 

b Let x and z be the least mean-square error linear estimates, based only on y, 
of x and z. Without using the explicit parameter values specified in (a), show that 

z — qx + by. 

c. We may generalize (b) to the case in which x(t), y(t ), and z(/) are zero-mean, 
jointly wide-sense stationary random processes. Let 


z(t) = 

[ a;(a) h(t — a) da + 

| y(v-)g(t - «) rfa > 


J/« 

Jly 

x(t) = 

f t - a) da-, 

h v 

t in I x > 
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denote optimum linear estimates of z(t ) and x(t). Without solving for the func- 
tions h, g , or f, prove that 

z(t) ~ \ 5(a) h(t — a) da. + y(a)g(t — a) da 

Jix Jr v 

is the optimum estimate of z{t) based only on knowledge of y(t) within the 
interval I y . 

8.9 A Poisson impulse train x(?) (see Problem 3.15), consisting of impulses 
occurring at an average of m per second, is the input to a realizable linear filter 
with impulse response g(j) and output y(t). The area of the /th impulse is a zero- 
mean, unit variance random variable and is statistically independent of the areas 
and times of occurrence of all other impulses. Determine the optimum realizable 
linear filter for predicting y(T), T > 0, from y(t), t < 0, when the criterion is to 
minimize the mean-square error e 2 = [z(0) — z(0)] 2 . Here z(0) — y(T) is the 
desired output, and z(0) is the output of the predictor at the time / = 0. Assume 
that the filter g(t) possesses a realizable inverse, g~\t). Discuss the optimum 
solution. In particular, consider the contribution of an impulse in x(t) to both 
y(T) and z(0) if 

(i) the impulse occurs prior to / = 0, 

(ii) the impulse occurs after t — 0. 

(Bode and Shannon 14 treat problems of this sort in detail.) 

8.10 A random variable m, with density function 

PM - W < 

|0; elsewhere. 


is communicated over an additive white Gaussian noise channel by means of 
the signal 

s m (t) = a(m) <p x (t) + b{m) <p 2 (t). 

in which 

Va 2 (m) + b\m ) = p(m) — 10 m + 11, 

tan -1 = 6(m) = 5 -n{m + 1). 
a{m) 

a. Sketch the locus of the signal vector s m . 

b. For weak noise, sketch the approximate behavior of the conditional mean- 
square error as a function of m when a maximum-likelihood receiver is used. 

c. Estimate the probability of anomaly that results when m = 0 and the noise 
power density is ^V 0 /2 = 2. 

8.11 Consider an FPM system with 

(AV2sin2TT(f 0 + mW 0 )t; ~T < t < T, 

s m\0 ~ | „ ' . , 

elsewhere, 
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operating over an additive white Gaussian noise channel with power density 
XJ2. Assume T = * 10" 3 , A = 200 V 2) JCJ2 - 1, N < 1. maximum- 

likelihood reception. 

a. Estimate l 5 if we require PM ** 10 “ 3 1 10 *• 
b Estimate PM if we require e a 10 -4 ; 10 -5 . 

c. Determine (approximately) the minimum value of A such that by appro- 
priate choice of W 0 we can simultaneously achieve « a » 10 4 and PM ~ 10 • 
What is the corresponding value of fV 0 ? 

8 12 A random variable m with/?* uniform over (■ - 1 , 1] is communicated over 
an additive white Gaussian noise channel by means of the foUowtngstg 
scheme which uses M + 1 orthonormal functions J ~ - • •’/'?; ine 

transmitter first determines the unique integer / and continuous variable m, 
— which satisfy the equation 


2/ + 1 + m 


i =0, 1 M- 1. 


The signal 


V E s — E <Pi(t ) + lfl ^ 


is then transmitted. The receiver reconstructs the value of m from maximum- 
likelihood estimates of i and m. 

a. Describe the signal locus. What is its total length? What is the value of 

^ b^Define^he event “anomaly” and determine an upper bound on its proba- 

b ' c^Upper bound the mean-square error in the absence of anomaly. 

d Compare the system’s performance with that of ordinary PPM. In particu- 
lar' comment on the significance of freedom to choose the energy £ anywhere 
in the interval [0, E s ]. 

8 13 Assume that the phase-and-frequency FPM signal s m (t) ot Eq. 8.121 is 
used totansmit the vahS « - J over an additive 

Determine all other values of m in the interval -l < m < i which produce a 
signal sjf) orthogonal to the one actually transmitted. Consider mtermed.a e 

values of m and justify estimating PM b y Et l- 8 - 122c - 

8 14 An antipodal FPM system is used to communicate a stationary lowpass 
Gaussian process with poweJ density JLJ2 = * and bandwidth IV Assume that 
theTvafiablTenergy-tomoise ratio per sample is BJ * o - 15 and that we require 
a receh/er output signal-to-noise ratio (in the absence of anomaly) equal to 0 4 
What is the minimum transmission half-bandwidth such that the probabili y 
^^nVaSfelling outside the transmission band is less than the prob- 
Sty of anomaly? Hint. Determine how PM is affected by relaxation of the 

constraint |wi| < 1. 


PROBLEMS 703 
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b. Because of the curvature of the signal locus, a noise vector of length |n| 
mav introduce an error in the maximum-likelihood estimation of m that is 
greater than is accounted for by the linearized analysis. Show that the error in 
the situation illustrated in Fig. P8.15 is 

V~E S 

(m - m) = 0, 

in which 0 is the solution to the equation 

ln| = 2 V£ s sin|^ V£ s 0^1 - — 

For small 0 show that 


0 « 


l n l 

vF s 



c To a first approximation we may consider the relevant component of n as 
a one-dimensional vector with variance Jfj 2. Show that the resulting weak- 
noise estimate of the mean-square error in the absence of anomaly is 





The output noise increment attributable to the local curvature of s m is called 
“quasi-Gaussian.” 


sin 2irfot; f 0 »W m 



Center Center 

frequency 2/o frequency fy 

(ii) 

Figure P8.16 
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8.16 The DSB-SC moduiator/phase shift combination shown in Fig. P8.16(i) 
produces an output signal s(t) called “quadrature modulated. 

a. Show that s(t) closely approximates a PM signal when G is A or less and 

b? The modulation index of the PM signal cos 2n [f 0 t + is defined as 

8~2itWo. Why? What is the value of 0 for 
c What does the modulation index become if 5(f) is first applied to a frequency 
doubler and then heterodyned back to center frequency /„, as shown in Fig. 
P8 i6(U) 9 (The half-bandwidth of each filter is much greater than W m but less 


d Assume that there are k stages of doubling and frequency translation and 
that the resulting signal is amplified and transmitted over an additive white 
Gaussian noise channel with XJ2 = HT* joule. What value of k is required 
if we use maximum-likelihood reception and require n\t) = 10 4 Onjhe absence 
of anomaly); assume that W m = 3 kc and the received power is P , - -10 watt 
e. Show how the system analyzed can be modified to provide a broad-band 
FM signal. (Armstrong 3 generated signals this way.) 

8.17 Consider a lowpass modulating signal m{t) of T-sec duration. Show that 
FM and PM yield equivalent total mean-square error (in the absence of anomaly) 
when maximum-likelihood receivers are used and the transmitted bandwidth is 
normalized in accordance with Eq. 8.164. Hint. Expand the FM modulating 
signal (approximately) in an N - term orthonormal Fourier series analogous to 
Eq. 8.152b and use the fact that 

a , „ N(N + l)(2N + 1) 


8 18 Assume that the weak-noise FMFB analysis of Appendix 8B remains valicl 
as the modulation index WjW m is increased indefinitely, with signal P°wer v4 
and noise power density XJ2 held constant. Show that such an assumption 
violates the channel-capacity constraint of Eq. 8.191. 

8 19 An idealized phase-lock PM receiver is illustrated in Fig P8.19. The 
bandoass filter H,(f) is broad enough to pass U0 with negligible distortion. 



Figure P8.19 
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Assume that the modulating signal mil) is the output from an ideal lowpass | 
filter of bandwidth W m and that | 

sjt) = A V2 cos 27 r[f 0 t + Wojn(t)l j 

s(t) = V2cos27r[/ 0 r + W<W 2 . 4 

a. H 2 (f) represents another ideal (rectangular passband) filter. Determine the 

smallest bandwidth for H 2 (f), and the value of the attenuator gain, such that 
m(t) = m(t) in the absence of noise. 

b. Determine the mean-square value of the output noise ri\i) when the multi- 
plier input noise is weak; that is, when n^(j) « A 2 . 

8.20 A conventional PCM system, with binary antipodal signaling and without 
error-correcting coding, is used to communicate an ideally band-limited 3-kc 
voice signal, m(t), over a system that incorporates 20 repeaters. At each repeater 
a decision is made on each bit and the PCM wave is reconstructed before retrans- 
mission. Each link is disturbed (independently) by additive white Gaussian noise 
with JsT 0 /2 = 10~ 10 and the maximum allowable transmission level produces 
10~ 4 watt of received signal power. Assume that the largest tolerable probability 
of anomaly (per speech sample) is 10” 4 and that |m(0l < 1- What is the smallest 
attainable value of mean-square error per sample? 

8.21 The voice signal of Problem 8.20 is now to be communicated over a 
single infinite-bandwidth, additive white Gaussian noise channel with A 0 /2 — 

10- 3 joule. 

a. Determine the minimum value of peak transmitter power (in watts) re- 
quired to obtain 

[«!(/)- m(/)] 2 ^ 10~ 5 

when PCM is used in conjunction with block coding and maximum-likelihood 
decoding. 

b. Repeat (a) for convolutional coding and sequential decoding. 
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signal decomposition, 502 
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theorem, 27 
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theorem, 321 
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406, 555 

discrete memoryless, 390, 462 
filtered signal, 485 
multivector, 219 
null-zone (BSEC), 401, 557 
random amplitude, 508 
random phase, 5 1 1 
Rician fading, 580 
vector, 212 
very noisy, 480 
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factoring of, 1 66 
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haracteristic function, joint, 156 
of Gaussian variables, 93, 164 
Eebyshev’s inequality, 95, 98 
ffiernoff bound, 97, 128, 394 
exponentially tight, 101, 398 
Ihi-square density function, 362 
'losure property, convolutional codes, 
422 

parity check codes, 379 
'ode, 299 
base, 312, 367 
selection, 304, 314, 476 
tree, 413 

3oded signal sequences, binary, 298, 
304 

multilevel, 309 
orthogonal, 358 
phase-shift, 360 
7oder, 365 

connection coefficients, 372, 409 
convolutional, 409, 468 
maximal length shift register, 386 
orthogonal signal, 381 
parity check, 369 
table storage, 367 
Coin tossing, 1 5 
Complexity, 364 
Component accuracy, 244 
Computation, bound, 440, 444, 473, 
483 

speed, 405 

Conditional, density, 65 
mean, 149, 542, 585 
mixed expressions, 74 
probability, 29 
variance, 149, 585 
Congruent decision regions, 263 
Constraint span, coder, 370, 412 
Convexity, 563 

Convolution, by linear filters, 135 
of density functions, 72 
of power spectra, 202 
Convolutional codes, 405 
binary encoder, 409 
closure, 422 
constraint span, 412 
general encoder, 468 
generator, 416 


Convolutional codes, linearity, 413 
suboptimum decoding, 417 
tree structure, 412 
Correlation, function, 174 
of signal(s), 238, 636 
output of linear filter, 180 
receiver, 234 
Cost, system, 364 
Covariance, 151, 172 
coefficient, 151 
function, 173 
matrix, 159 

Criterion of goodness, 582 
Critical rate, 348 
Crosscorrelation function, 1 87 
Cross talk, 508, 575 

Decibel, 250 

Decision, regions, 79, 214, 409, 510 
rule, 35, 77 

Decoder, Bose-Chaudhuri-Hocqueng- 
hem, 441 
sequential, 425 
suboptimum, 417 
threshold, 441 
Degrees of freedom, 376 
Demodulation, single-sideband, 505 
synchronous, 493 

Density function, Cauchy, 48, 89, 126 
chi-square, 362 
conditional, 65 
exponential, 47, 62 
Gaussian, 49, 53, 93 
joint, 49 

multidimensional, 57 
probability, 45 
Rayleigh, 47, 64 
uniform, 48, 89 
Detection,' envelope, 517, 610 
list of L, 481 

Difference frequency (DF), 695 
Differential phase-shift keying, 527 
Differentiating filter, 651, 693 
Dimensionality theorem, 294, 348; see 
also Effective dimensionality 
Discriminator, FM, 650 
Disjoint decision regions, 214 


Disjoint events, 18 
Disjoint filters, 3 88 
Distance, decoding, 459 
Euclidean, 217 
Hamming, 406, 477 
tilted, 431, 463 
Distribution function. 38 
joint, 40 

Diversity, 219, 533, 560 
optimum, 549, 569 
Double-sideband (DSB), 609 
Double-sideband suppressed carrier 
(DSB-SC), 493, 512, 609 
comparison with SSB, DSB, 507, 610 

Effective dimensionality, FPM, 642 
phase-and-frequency modulation, 645 
PPM. 626, 640 

Elimination of random variable, 55 
Empirical average, 84, 96 
Energy in signals, 238 
maximum, 594, 614 
mean (average), 587 
minimum, 247 
peak, 247 
per bit, 288, 307 
per sample, 605 
Entropy function, 103, 484 
Envelope, 518, 609 
Envelope detection, DSB, 610, 700 
phasor diagram, 523 
PPM, 639 

random phase channel, 517 
Equivalent signal sets, 246 
Error detection, 9, 117, 453 
Error probability, antipodal signals, 
250 

biorthogonal signals, 263, 266 
DPSK, 527 

hypercube vertices (bit-by-bit signal- 
ing), 257, 290 
L-diversity, 544, 547 
of convolutional code, 416 
orthogonal signals, 250, 258, 266 
incoherent, 533, 577 
Error probability, random phase, 523 
simplex signals, 261 


INDEX 715 

Event, 16 
complement, 1 8 
disjoint, 18 
intersection, 1 8 
joint, 40 
null, 18 
union, 18 
Expected value, 84 
linearity, 87 

of exp { u-a 2 } when a- is Gaussian, 
522, 570 
of product, 88 
Experiment, compound, 24 
random, 1 3 

Exponential bound parameter, R 0 , A 
letter input alphabet, Q finite, 
396 

{2 = co, 316, 397 

A -level amplitude modulation, Q- 
A, 403 

/! -level phase modulation, Q = oc, 
360 

binary antipodal, Q = °o, 303 
BSC, A = Q = 2, 399 
BSEC, A =2, Q = 3, 401 
fading channel, A =2, Q = oc, 554 
A = 2, Q - 2 or 3, 557 
optimization, 319, 354 
saturation, 311 
Shannon bound, 310 
very noisy channel, 480 
Exponential error bounds, block-ortho- 
gonal signals, 291 
heuristic discussion, 305 
see also Exponential bound para- 
meter, Reliability function 
Exponential optimality, 101, 309, 346, 
347 

Expurgation of codes, 348, 381 

Fading channel, 508, 527 
with coding, 550 
Fail-safe two-way strategy, 455 
Fano algorithm, 431, 462 
implementation, 438, 453 
Feedback receiver, FM, 665 
weak-noise performance, 694 
Feedback systems, 454, 462 
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Filtered impulse noise, 144, 164, 209 
Filtered-reference receiver, 488, 491 
-signal receiver, 486, 491 
Frequency modulation (FM), 645 
conventional receiver, 650 
feedback receiver, 665 
Fourier series, 647 
maximum-likelihood reception, 705 
modulation index, 646 
probability of anomaly, 661 
quasi-static approximation, 654 
signal bandwidth, 646, 659 
weak noise suppression, 649 
Frequency-position modulation 
(FPM), 

guard band, 643 
maximum-likelihood receiver, 644 
optimality, 668 

phase - and - frequency' modulation 
644, 665 

probability of anomaly, 645, 664, 
668 

RF half-bandwidth, 659 
single parameter, 642 
waveform input, 643 
Frequency translation, 492, 607 

Gaussian density function, 49, 93 
characteristic function, 93, 164 
conditional, 69, 149 
contour plots, 1 50 
definition by linear combination, 210 
joint, 52, 148, 153, 156, 170, 210 
linear transformations, 153, 210 
moments, 94, 205 
properties, 155, 156, 164, 210 
Gaussian process, 171 
joint, 186 

linearly filtered, 177 
specification of, 172 
stationary, 175, 184 
Generator — convolutional code, 416 
Genie, 419, 466 

Gram-Schmidt procedure, 240, 266, 
616 

Group code, 380 
Guard band, FPM, 643 
PPM, 640 


Heterodyne, 492 

Hypercube vertices, 254, 320, 407 
Hypersphere, 320, 323 
volume, 330, 355 

Impulse function, 46 

Fourier transform of, 73 
train, 207, 679, 70! 

Incoherent receiver, 519 
Incorrect subset, 464 
Instantaneous phase, 645 
frequency, 645 

Integrate and dump circuit, 237 
Interpolation function, 678 
Intersymbol interference, 295, 358, 457 
Invariance of error probability, to 
choice of base, 232 
to message, 263, 378, 417 
to rotation and translation, 246 
Inverse, filter, 242 
matrix, 197 
operation, 222 

Intermediate frequency (IF) filter, 
650, 696 

Irreducible error, 691 
Irrelevance, theorem of, 220 

Jacobean, 112 

Karhunen-Loeve expansion, 598 
Kronecker delta, 165 

Lens-shaped region, 335 
Letters, alphabet, 312 

probability assignment, 313 
Limiter-discriminator, 650 
Linear filters, 177 

ideal lowpass, 191, 493, 598 
ideal bandpass, 492, 650 
Linearity, convolutional coder, 413, 422 
of expectation, 87 
parity check coder, 380 
Linear modulation, sequence of para- 
meters, 594 

single parameter input, 583 
waveform input, 598 
List decoding, 481, 578 


Majority rule decision, 105, 117 
Marcum (^-function, 579 
Mass density interpretation, 55 
Matched filter, 235, 282 
and component accuracy, 244 
contrast with inverse filter, 242 
frequency domain interpretation, 242 
integrate and dump, 237 
signal-noise interpretation, 239 
Matrix, covariance, 159 
diagonal, 198 
identity, 198 
inverse, 1 97 
properties, 192 

Maximum a posteriori probability re- 
ceiver, 213 

Maximum-likelihood receiver, discrete 
communication, 214, 264 
waveform communication, 589, 621, 
. 670 
FPM, 644 
PM, 656 
PPM, 623, 638 
Mean function, 172 

at output of linear filter, 179 
Mean-integral-square error, 603 
Mean-square error, 542, 582, 689 
optimum linear filtering, 683, 700 
per component, 596 
total, 633 

Measurement of channel, 524, 542, 580 
Message process, Gaussian, 660, 683 
Minimax decision rule, 118, 264, 381, 
587 

Mismatched receiver, 244 
Modulation .index, 

DSB, 609 
FM, 646 

Modulator, 233, 365 
double-sideband suppressed carrier, 
493, 609 
quadrature, 705 
single-sideband, 505 
vestigial sideband, 576 
see also Frequency modulation, Lin- 
ear modulation. Frequency-posi- 
tion modulation, Phase modula- 
tion, Pulse-position modulation 
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Moments, 88 
central, 88 

of Gaussian variable, 94 
Multimodal density functions, 618 
Multiplexing, frequency, 507, 597 
quadrature, 504, 640 
time, 597 

Mutually exclusive, 14 

Narrowband noise, 503, 575 
Noisy amplifier, 145 
Nonwhite noise, 489 
Null-zone channel, see Binary sym- 
metric erasure channel 

Optimum decision regions, 214 
receiver, 81, 211 

Optimum mean-square filtering, linear, 
683 

with joint Gaussian statistics, 700 
Orthogonal signals, 250 
fading channel, 533 
incoherent channel, 519 
see also Block orthogonal signals. 
Error probability 
Orthonormal base, 233 
functions, 224, 365 
Outcome, 14 
Overflow, 447 
probability, 449 

Paley-Wiener criterion, 489 
Pareto distribution, 446, 483 
Parity-check coder, 370 
multiamplitude codes, 376 
Parseval equations, 238 
Phase, comparison, receiver, 524, 580 
reference in PPM, 639 
Phase modulation (PM), 655 
maximum-likelihood reception, 656 
RF half-bandwidth, 659 
weak noise suppression, 656 
Poisson random variable, 125 
impulse train, 207, 208, 701 
Polar transformation, 63, 114 
Power, transmitter, 287 
Peak, 318 
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ower spectrum (density function), 
181, 184 

bilateral frequency basis, 189 
cross power density function, 686 
narrowband noise, 500 
’re-emphasis FM, 655; see also Phase 
modulation 

’rincipal diagonal, 198 
> robability measure, 16 
a posteriori, 34 
a priori, 34 
conditional, 29 
error, 132 
joint, 3 1 

transition, 33, 391 
see also Distribution function. Error 
probability 

Pulse amplitude modulation (PAM), 
597, 607 
with PCM, 702 

Pulse-code modulation (PCM), 674 
conventional, 675 
with error correction, 676 
with PAM 702 

Pulse -position modulation (PPM), 
antipodal, 637 
arbitrary pulse, 634 
maximum likelihood receiver, 623, 
638 

probability of anomaly, 629, 636, 
665 

quadrature multiplexing, 640 
single parameter, 623 
waveform communication, 639 

Q( ) function, 49 
bounds, 82 

see also Marcum Q -function 
Quadratic form, 159, 197 
Quantization, increasingly fine, 396 
on fading channel, 555, 578 
PCM, 674 
with coding, 386 
without coding, 122, 128 
see also Binary symmetric channel, 
Binary symmetric erasure chan- 
nel, Exponential bound parameter 
Quasi-Gaussian noise, 704 


Radio-frequency (RF), 607 
Randomness, 12 

Random coding, 299, 312, 333, 424 
and pairwise independence, 369 
see also Exponential bound para- 
meter, Code selection 
Random process, 131 
Gaussian, 171 
linearly filtered, 135, 179 
nonstationary, 143 
specification of, 133 
stationary, 135 

see also Stationary random process 
Random variable, 37 
complex, 91 
equality, 58 
vector, 41 

Rate, source, 286 » 
convolutional code, 411 
critical, 347 
expurgation, 348 
Rayleigh, fading channel, 533 
random variable, 47, 64 
Receiver, correlation, 234 
filtered reference, 486, 491 
filtered signal, 488, 491 
implementation with decoder, 387 
incoherent, 519 

linearly modulated waveforms, 604 
matched filter, 234 
maximum-likelihood, continuous var- 
iable, 621 

see also Maximum-likelihood receiver 
Rectangular decision regions, 248 
Rectifier, 61 

Relative frequency, 14, 19, 85, 97 
conditional, 25 

Reliability function, block-orthogonal 
signaling, 342, 361, 667 
generic form, 347 
upper and lower bound, 346 
see also Exponential bound para- 
meter 

Repeat-request, 455 
Result, 14 

Reversible matrix transformation, 168 
Reversibility, theorem of, 220, 488 




Rotation, of coordinates, 155 
of signal space, 246 
Sample mean, 96, 126 
Sample space, 16 
functions, 1 3 1 
point, 16 
Sampling, 598 
functions, 60 1 
of processes, 601 
theorem, 599, 678 
Saturating transducer, 591 
Saturation of signal class, 311 ■ 
Scattering, 529 
Schwarz inequality, 240 
Semi-invariant moment generating func- 
tion, 127 

Sequential decoding, 425; see also 
Fano algorithm 
Sequential source, 285 
Shift register, 366 

maximal length coder, 385, 478 
Shot noise, 147, 209 

in transistors and diodes, 209 
in triode amplified, 147 
Sideband, 493 
Signals, antipodal, 249 
arbitrary correlation, 283 
biorthogonal, 261 
hypercube-vertex, 254, 288 
minimum energy, 247 
orthogonal, 250, 288 
simplex, 260 
Signal space, 225 

rotation and translation, 246 
Signal-to-noise ratio, 84, 588, 690 
at matched filter output, 239 
Simplex signals, 260, 283 
as parity check code, 381 
as shift register sequence, 385, 478 
Sine-integral function, 295 
Single-sideband, 505, 609 
comparison with DSB-SC, DSB, 507, 
610 

Singular density function, 152, 168 
matrix, 168 

Sphere hardening, 323, 484, 670 
packing, 332 
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Spherical symmetry, 232 
Standard deviation, 89 
Starting node, 417 
Stationary random process, 135 
Gaussian process, 175, 184 
jointly stationary, 1 87 
nonstationary, 143 
random-phase periodic process 140 
wide-sense stationary, i g3 
Statistical independence, of events, 31 
of events and vectors 76 
of functions of independent vectors, 
77 

of random processes, 188 
of random variables, 70 
pairwise, 32, 90 
Statistical regularity, 13 
Stirling’s approximation, 362 
Stretch, 611 
FM, 658 
FPM, 644 

phase-and-frequency modulation, 644 
644 

PPM, 635 

uniform, 614, 620, 669 
Synchronization, 456, 645 
Synchronous demodulation, 493 
Synthesis of waveforms, 223 
System comparison, FM, FPM, PM, 
probability of anomaly, 664 
weak noise, 659 

Table storage of code, 368 
I elephone, data communication 309 
457 

Theorem of total probability 31 
Threshold, 81, 616 
decoding, 441 
Fano algorithm, 43] 
in PPM, 627, 633, 636 
quantizer, 401 

see also Anomalous reception 
Time-bandwidth product 294 348 

683 

Transformation of variables 58 
addition of constant, 59 ' 
half-wave linear rectifier, 61 
iterated, 62 
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Transformation of variables, multiplica- 
tion by constant, 60 
product of variables, 69 
quadratic rectifier, 61 
reversible, 3 1 1 
sum of variables, 68 
Transition probabilities, 33, 391 
Transmitter implementation, 223, 365; 

see also Coder 
Trial, 13, 24 

Twisted modulation, 613, 670, 703; 
see also Stretch, Anomalous re- 
ception 

Two-way strategies, 454, 462 

Uncoded transmission, 404 
Undetectable errors, 448, 453 
Union bound, 264, 301, 419 
Unrealizable filters, 686 

Variability of computation, 446 
Variance, 89 
Vector, 41, 193 
Vector receiver, 233 


Venn diagram, 18, 265 
Very noisy channel, 480 
Vestigial sideband, 576 

Waiting-line, 449 

Waveform communication, see Linear 
modulation. Frequency modula- 
Iation, Frequency-position modu- 
lation, Phase modulation. Pulse- 
code modulation. Pulse-position 
modulation 

Weak law of large numbers, 96, 204 
Weak noise suppression, 614 
asymptotic bound, 674 
in FM — feedback receiver, 694 
in FM — conventional receiver, 649 
in FPM, 642 
in PPM, 625 

Weight of binary vector, 407 
Whitening filter, 489, 502, 561 
White noise, 188 
Wiener-Hopf condition, 686 

Zero-crossing, 663, 691' 



